Introduction
Hey Everyone!
This is a follow-up to my previous article ("Bringing it up to Eleven") about improving quality and consistency in our generations.
The link to the previous article is here: https://civitai.com/articles/1721/improving-results-by-using-multiple-models-of-the-same-concept-turning-it-to-11 (which is still worth reading).
So, once I found out and confirmed with many people that combining models of the same concept brings up the quality I focused on exploring this idea further.
Before you delve deep into the article, you may take a look at some of the samples I've made using the methods described below: https://civitai.com/posts/1078916
Using multiple models to achieve better results
As you know I ventured also in the LoRA and later Embeddings territory. Initially, I was using purely solo LyCORIS models but during the time of the first article, I started mixing them to see what would be the result and was quite frankly stunned.
The results were pretty much better all over the board - not only I was achieving better likeness but also the results were more consistent.
I think the consistency had a better boost because in some cases I wasn't getting the best likeness but I was getting pretty much the same face all the time.
For those who want to skip the first article, the jist is: I was merging 2 LyCORIS at 0.65-0.7 weight each (around 1.5 sum), three LyCORIS at around 0.5 (again 1.5 sum) or even four at around 0.3 - 0.35 (sum of 1.2 - 1.4).
I was even finally able to get certain people that were very difficult to capture using this method (and my record was to use 6 models to achieve that, at various weights that summed up to around 1.5 in total).
Later I started adding some LoRAs to the mix and found out that they also work really nice. I've realized that there is an additional benefit of mixing them together.
LoRAs are usually more stylized - they capture the likeness of a person but in a less photorealistic sense than a LyCORIS does. LyCORIS seems to get all the wrinkles, moles, and pores of the skin (sometimes even too much, as it was a frequent complaint of some).
But by mixing LoRA and LyCORIS we get the best of both worlds, we keep the likeness and the consistency and by playing with weights we can add or lose some wrinkles/pores depending on our needs. Do we want to make someone younger? We add some tokens about the age but also increase the weight of the LoRA instead of LyCORIS.
This on its own gives us a lot of flexibility, but this is not where we stop! There is something more that can be added, so of course I had to test it :)
I was going on holiday so I figured I would set up a bigger training job and during 2 weeks I've trained about 750 Embeddings.
Some of them are quite good on their own and for something that only weighs 8KB might be good enough. But not for us, who are trying to get to perfection :)
I started experimenting with mixing all three together and the results were also spectacular. To be fair - you can get a really great likeness with just the LoRA/LyCORIS combination but adding Embeddings seems to be an easy win.
I was testing a bit and quickly figured out that a nice starting point is to include embedding at full weight, lycoris at 0.4, and Lora at 0.2 just like this:
alicia vikander <lora:lora-small-alicia-vikander-v1:0.2> <lora:locon_aliciavikander_v1_from_v1_64_32:0.4> aliciavikander-ti
I guess it is also a time for a disclaimer (and an advert? :P) - I was mainly testing this on my models. You can mix other models as well (just remember about their triggers which are most likely different than my standardized ones).
I am also mainly testing on Serenity (originally v1 and later v2) but other photorealistic models work fine as well. The default ratios are however based on Serenity (version v2 was additionally fine-tuned on many celebrity datasets).
If you want to check other models, especially those that are less photorealistic - you should probably give more weight to LoRA/LyCORIS (and/or decrease weight on the embedding).
In the example above the "aliciavikander-ti" is my embedding, "<lora:lora-small-alicia-vikander-v1:0.2>" is the lora, "<lora:locon_aliciavikander_v1_from_v1_64_32:0.4>" is the lycoris and "alicia vikander" is just a token trained on Serenity but most models do know already some celebrities so adding their name helps a little bit too.
As I've said, Serenity v2 was trained on multiple (around 400-500) celebrities and some of them look really good without additional models, some still need a bit of help from additional models to achieve a better likeness.
And the advert part: I'm uploading models here to Civitai but there are so many models that I can't keep up. So, if you want early access to many of my models - you can find them on my buymeacoffee page.
Recently I've made a small static site that lists all my models, lets you know who was trained, and gives you direct Civitai links: https://malcolmrey.tiiny.site/
It has a quick (really quick :P) search engine so you can check who was already trained. You can always ask me to upload this model and I will try to prioritize it, but like I said - if you want some models sooner - you can support me on the bmc page (https://www.buymeacoffee.com/malcolmrey)
By quickly checking this small site, we can see that I've already uploaded 16 models to Civitai that have all three model types (Lora/locon/embedding) and I decided to make some samples with them so you could also play with them as well :)
But like I said - you can use other models, so you could take an embedding made by someone else and boost the likeness of the subject by combining it with my models. Checking the site quickly again - there are 110 models that I've uploaded with both LoRA and LyCORIS.
The site also shows that I have made in total of 836 unique models (excluding the style, context, etc models). So, there is a lot of them in the upload queue.
Bear in mind that the default is just a starting point, which is quite good for many models already.
But some models are tricky. For instance, I get very good likeness of Felicia Day using this combination: <lora:locon_feliciaday_v2_from_v2_64_32:0.75> <lora:locon_feliciaday_v1_from_v6_64_32:0.75> <lora:lora-small-felicia-day-v1:0.25>
But applying the default like this:
felicia day <lora:lora-small-felicia-day-v1:0.2> <lora:locon_feliciaday_v2_from_v2_64_32:0.4> feliciaday-ti
Makes her barely recognizable. I will be investigating that, I have two guesses: Felicia Day may not be in the base models all that much so the Embedding does not work as well as in other cases. And/Or my models are undertrained (which would suggest that the Lora/locon combo sums up to 1.75 and as you remember - I was earlier talking about trying to make the total sum to around 1.5 and not more).
Ok, so why should we be adding Embeddings if we can already achieve great results with LoRAs and LyCORIS mixed together?
Well, the obvious reason is that Embeddings are very small - you can pretty much keep all the models in one place without worrying. All of my 800 embeddings only use 20 MB of space. On the other hand, my 800 small Loras need 4 GB (which is still manageable), but the 800 LyCORIS equals 80 GBs :>
The second obvious reason is that it adds flexibility - you can pick and choose from different models as you wish and you can experiment to get what you want.
But there is a third reason. If you experimented with multiple loras then you know that adding more and more tends to introduce some weird artifacts. Even with only my models, if you get closer to the sum of 2 or go beyond - you will experience overtraining symptoms.
This is because you cannot add more and more loras infinitely. At some point, the whole thing will just collapse. You most likely have seen loras that are overtrained by default and you need to use them at very low weights otherwise they introduce too many artifacts or override stuff you do not want to change.
Some Loras work fine on their own but when you start adding other loras - they become unusable.
Here is where Embeddings shine: because you combine the Embeddings with other models, those other models require way lower weight strength (from 1.5 down to 0.6 in most of my examples). This means we make room for other, more difficult loras :)
To quickly summarize: using multiple models of the same concept improves the quality twofold: brings the likeness closer and makes the outputs more consistent.
Distance and likeness of the subject
There were some comments from time to time that my models work well up-close and they generate faces mainly but if people want to use it in some more complex stuff - it all falls apart.
It is a well-known fact that when SD tries to generate faces far away - they are quite often unrecognizable. One could certainly train using datasets that have more images from far away but the results aren't impressive.
There are at least three or four good common ways of fixing this issue, however:
1) Use some kind of upscaling with denoise. For A1111 it will be in most cases hi.res.fix (but there is also a script called loopback scaler which I hear is pretty great!)
Remember to increase the denoise the further away the person is and decrease it if the person is rather close.
2) Use ADetailer or inpainting (ADetailer is pretty much automatic inpainting). I love ADetailer as you can control quite a lot in there and get really good results with it. Remember that it is not always required and sometimes it is better not to use it. It will come with the experience.
3) ControlNet - this will help you if the models are fixated on the faces and you want different scenes/positions.
4) You can increase the base resolution a bit. We were taught that SD 1.5 is best at around 512 (and usually 512x768 or something like that). Well, on some models that were finetuned and/or merged with many other models - you can try a bit more without worrying about double heads.
Using Serenity I was able to generate even 700x1000 without upscaling and although I had sometimes doubled heads or very long necks - I was getting good images 80% of the time (and remember that the time to get those is way shorter than getting high.res.fix)
This was the limit I was comfortable with, but probably something like 650x900 would be a nice compromise :)
And of course - you can combine those solutions together!
Additional Tips
Some people asked me what I do nowadays to get even better quality. Well, here are some of my tips:
1) Eyes.
Originally people were using tokens like "perfect eyes", and "good iris" in positives and something like "bad eyes", and "deformed eyes" in negatives. That had little to no effect (I would even say it was more of a placebo).
Then we got the VAE and it worked wonders with the eyes (the quality was definitely improved). But we weren't there yet.
So, some people (including myself) made models specifically for the eyes. I made the Lora/LyCORIS named "Perfect Eyes" and I am really happy that those are still quite popular :)
Initially, my suggestion was to use it in the base prompt with a lower weight (and it does work). And when ADetailer came up - to increase the weight there.
But now I will give you an even better tip :-)
There is an ADetailer model specifically for the eyes: mediapipe_face_mesh_eyes_only
You would add it as an additional ADetailer model (besides the models that correct the face, hands, and whatnot) and use it in the positives:
"photo of perfecteyes eyes <lora:locon_perfecteyes_v1_from_v1_64_32:0.7> , perfect eyes"
and this in negatives: "BadDream, (UnrealisticDream:1.2), realisticvision-negative-embedding, badIrisNeg"
If you prefer Lora instead of LyCORIS - that is perfectly fine (or you could even try Loras from other people, but I use mine and they work really great :P)
As for the negatives, the first three are the default combo negatives used in many photorealistic models. But the important one is the last one: "badIrisNeg" - it is available on civitai and it does indeed help a bit.
2) Skin Conditioner Slider -> https://civitai.com/models/167525
I usually run with 0.3 ( <lora:skin_slider_v2_1_FACE:0.3> ) but sometimes I increase it slightly. It makes the skin texture less plastic. Adds a lot of details to it.
3) Detail Tweaker -> https://civitai.com/models/58390
My default is to go with 0.7 (<lora:add_detail:0.7>) but I tend to play with it sometimes. it really does bring the details up (or down, depending on weight) which not only includes the person but also the clothing and the background. Everything.
4) Default negatives: BadDream, (UnrealisticDream:1.2), realisticvision-negative-embedding
I picked it from some famous photorealistic model and once I tested it - I never went back. Those really do make magic happen. I rarely go without them!
Also a fun fact. Once by accident, I doubled them (used them manually in prompt and then also included them in the styles dropdown). Turns out it does not only not break anything - but it even helps a little (however sometimes it is too much).
This makes me think that you could just increase those weights rather than typing those Embeddings twice. It is worth taking more look at it :-) I'll leave it to the curious.
In general - there are plenty of positive/negative embeddings that bring up the quality - and they seem to work well together with each other. It is just a matter of preferences - how many helpers does one person want :)
Other good contenders are: epiCPhoto, epiCPhoto-neg, CyberRealistic_Negative-neg
My take from this is: if an image looks fake then adding some of those positives/negatives might improve the quality a lot. Just play with them.
They are definitely way better than the meme negative prompts ("wall of text" prompts with missing arms, deformities, etc)
Attachments
I have included the following files in the attachments:
articlesamples1 and articlesamples2 are some samples using the models available on civitai - with all the metadata
localBrowser.zip - it is a local version of the https://malcolmrey.tiiny.site/ (but since it is local - it is just a snapshot of current state)
globalallpeople.txt - it's a wildcard of most of my models that have all three types trained, the list is missing a lot and many you can still only find on buymecoffee, but i figured you may want to get it and just copy paste the ones you want and tweak them accordingly
Thanks
Thank you for reading - please write what you think, have it helped you, guided you in some direction, or maybe it was too long? :)
I always love to read the comments, I'm also learning this way and even after more than a year - I feel like we are all still learning a lot every day :)
And without further ado, I wish you all a Happy New Year!
p.s. if you want to support me and also get early access to many models (or just request a new model!) - please visit my coffee site :)
Cheers!