With a lot of great feedbacks and nice messages for my work and checkpoints, i wanted to create quickly a V3 of AlterBan since the V2 is having some difficulties with hands and prompt adherence (my "let's test it" bounty help me identify those drawbacks).
Special Thanks
Before getting in the explanation, here is my "special thanks". These users have been of great support by using my models (a lot!), posting pictures, reacting on my articles and generally being absolutely adorable with me here: DeskGrenade, missfidonyo, AstroVariant, DarkFun, mommymia, Gladas, rambo943, Edenfel, Destinyfaux, SlabBulkhead, Lelouch13, kotapara, xerena, ooyamada, DarkfireAI, Dimch, Maly007 and Neoni (who is not on Civitai anymore but is still creating amazing content).
I hope i am not missing anyone and thank you all! πππ
Please check their work and send love π
The long story... but shorter.
So... i was about 1 hour in the writing of my article with multiple images and so on when my browser crashed and i lost everything T_T (i should have known better and save a draft regularly).
This new version of this article will be shorter, sorry T_T
The basic idea is to fix two points:
Hands, once and for all, as much as possible
Prompt adherence
Hand fixing
I decided to do my usual method when fixing hands, as explained in a previous article, by using merge of checkpoints + LoRA. Bonus point if i can introduce a bit of "new stuff" in AlterBan.
Here is the first few steps:
AnBan AlterV2 + 0.8 x HandPatch "4merge" = AlterV2.1
AlterV2.1 x 0.9 + Mature Ritual x 0.1 (using Supermerge, CosineB method) = AlterVtest
AlterV2.1 x 0.8 + AlterVtest x 0.2 (regular weight sum) = AlterVtest2
AlterVtest2 + 0.6 x HandPath "4merge" = AlterV3 (only a release candidate, spoiler, not the final)
The main different step was the number 2.
I was using Mature Ritual to get a more "adult" feeling to my subjects and i used a special computation method from supermerger: CosineB.
Basically, even with low weight in the merge, the overall composition of the result stay closer to B model (so, Mature Ritual here). Here, i wanted to keep the AlterBan style but change the composition.
At this point, it was time to try and introduce the second subject.
Prompt adherence
SDXL does its job in three main steps:
Analyse the prompt via CLIP and generate a numeric values describing the concepts it understands as requested by the user.
Push these numbers in the UNET that can generate a latent image using those numbers (thanks to its training).
Feed the latent to the VAE to get the picture.
If a model does not respect prompt adherence, it is probably due to the CLIP not feeding the good numbers to the UNET (or the UNET being completely biased toward the same stuff).
What makes Pony and Illustrious different from base SDXL is the fine-tuning they received. Both the CLIP and the associated UNET are aware of new concepts, characters, style and so on. And when we merge models, we also merge the CLIP which can lead to some stuff being less understood.
But, what if i could just "reset" the CLIP and hope the modified UNET was still able to handle the number it got? Let's try this, back to my Python BS:
from safetensors.torch import load_file, save_file
orig = load_file("AnBan_AlterV3.fp16.safetensors")
illus = load_file("illustriousXL_v01.safetensors")
new = {}
for k in orig:
if "conditioner" in k and k in illus:
new[k] = illus[k]
else:
new[k] = orig[k]
save_file(new, "AnBan_AlterV31.fp16.safetensors")
Since the original Illustrious XL is still used as base in LoRA training, let's try and put its CLIP back in AlterBan. What was the result?
Well, first it did not break everything, secondly, it gave a bit of depth to the result. Let's test with "The Angel":
The "bangs" from the original prompt was lost in the fight. Maybe not the best solution to ensure i am sticking to the prompt, but i still feel i am unto something.
I'll investigate more, but for now, let's tone it down: AlterBan V3.2 = AlterBan V3 x 0.6 + AlterBan V3.1 x0.4
This time, it's "Multiple wings" that didn't make the cut. What about LoRA? I'll remix this picture from a recently released LoRA by DeskGrenade. It is using AnBan AlterV2, let's see with AlterV3.2
The hands are good, the weapon blade is in a more logical position, but this feels like it would be a good candidate for a release :D
OK, what about this one? With this LoRA from AstroVariant. Very low CFG here, complex prompt, lots of stuff can happen.
Love those boots :D The prompt has been followed as far as i can see, i am just missing hands for comparaison.
Let's do an other remix, this time with this LoRA from FreckledVixon. The prompt is VERY small, so, the model will be important for the impact since nothing about background, quality or position is in there.
Not too bad! But hands are a again a mess T_T I'll have to do a few more tests before releasing this time, i don't want a new "I should have tested more before uploading a 6.5GB file"... but i'll release a true V3 soon. :D
Thanks for reading!