Use CFG++ / PAG / SEG combo for sharper image.
For CivitAI users: CFG 3.0, CLIP skip 2, default VAE.
(215c) No quality tags are required, but excessive tags will not break either.
(256c) Slightly less consistint then 215c, because content seems less focus on human characters.
~~(255c) It won't obey your prompt. It would be a nice base model for training. 這個破模型比 SDXL1.0 更欠擬.~~ 255c is confirmed has broken recipe and picked wrong base model.

Abstract

I present AstolfoMix-XL, a merge model focusing on "exploring merging options", while being stuck in merging the popular model and waiting for SD3. Currently it is in anime style. Welcome back, Astolfo you are so cute!

Introduction

AstolfoMix-XL is the "scale up" merge from my previous models (SD1 / SD2), and hopefully find more useful ideas to expand my article, and reuse the valueable contents (and concepts) in the internet. The haystack is too large for me to merge by hand. With a dedicated merger (like mergekit) doing the procedure in auto without saving intermediate models, I expect it can produce more general but high quality contents out of trained materials.

This is tough. 這條學術路線究竟是復興, 還是倒退?

Model merging is an extented topic on ensemble learning, there are quite a bit of merging algorithms for AI models, and it is being discussed formally because LLM models are generally huge and too expensive to train. Many merging algorithms are proposed in paper's own code repo (DARE), or being forggen (MergeMany in Git Rebasin), or being undocuemented ("rotate" with more "matrix operation"), meanwhile dedicated merger (mergekit, or supermerger) is being devloped.

With a slight belief (MDP=AR(1) i.e. LLM merge to SD merge is viable), quite a bit of experiments (SD1 as ModelSoup, SD2 as model selection without alignment, then PR, PR, and PR because there is no codebase in the wild), and some luck, I am brave enough to "get it done" and publish it. It is very hard to with ~~little to none~~ community support (especially most model merger are disappeared after NAI v3, or the finetune hype, or realized that MBW theory is actually invalid ~~it works because of introducing parameters with optimization loop only, which is not artistic at all~~).

For example, from the released recipe of AnythingXL, I can interprept it as "average of 7 models, with 14.2857% of each model, meanwhile taking the favourite model as double i.e. 28.5714%". Meanwhile, PonyMagine is successfully apply DARE on top of a custom recipe.

Methodology

融模可以不玄, 只是門檻有點高 (魔改算式) 罷了.

I will make a seperated discussion here, or an article in this platform, or see my article in Github (this also), or seperated article in CivitAI if it is not written yet.

Since model merging in SD at this level is lack of discussion, I have nothing to reference but carefully make justification and make insight. From the study with "related works" (I read for multiple times because it is close to ML / Math discussion), I expect that algorithm modification will be essential.

Therefore I first get the original implementation done (took months), and finally perform analysis on the inherited mathmatical properties. Soon I found that task vector should be normalized (subset of rescale), meanwhile sign election should be based form identity instad of signed movement, because MDP under SD is suffering gradient problems like RNN does.

Meanwhile, I don't have resources to either train model (hardware / time / human resources, or interest), or even conduct a throughful evaluation (like team lycoris and deepghs) of a model. All I can do is conduct a subjective HTP Test towards the model, assuming it projects its natural behaviour. Therefore you will see "a pink hair boy interacting with a car, with random but filled backgound".

Experiements

(215a) Discovered models, filtering process, the merging script (End-to-End with single click!) and the 387-line recipe are generated.

(215b) Extended use of the merger.

(215c) The dedicated merger which was left uncompleted.

Parameter Searching

Parameters for merging algorithms are found by randomly selecting 10% models out of the model pools (e.g. 20 out of 192) for viewing the effect. It requires much 20x less time to merge, and still able to represent the estimated final result.

Prompts

Even no prompts works. 質量詞真的需要嗎?

I have tested it will long prompts, it works fine. In contrast, you will see most of my posted images are just a few words, and no negative prompt (because I seldom need to exclude things). However, when I add quality tags, it may produce worse or even broken images, becuase the recipe models are fighting each other with contradicted knowledge.

CFG / STEPS / Additives

It is as wide as the SD1 version. Currently I find that "CFG 3.0 + PAG 1.0 + mimic 1.0 phi 0.3 + FreeU Default" will be good. "48 steps of Euler" is enough for generation, however I still favor 256 STEPS + 64 Highres.

Discussion

Even SOTA merging algorithms cannot learn all concepts from all models, it shouldn't be a replacement of Trained Models / LoRAs, more like a base model for further developement. 更好的底模都因為社群的誤解, 或者短視, 被忽略了. 從 SD1 / NAIv1走來, 大家學會了甚麼?

It is a pity, or "final nail to the coffin", that Pony is accepted just because of its NSFW ability, overriding any technical consideration, making resources not sustainable.

I am aware of low attention throughout the journey (that is huge improvement since Baseline Model, the image quality is increased, with less "halo effect"), but I must get it done to leave a mark in the (art) history. I know no one will be interested on developing open source model, because the incentive is way too low, no matter material or just spiritual support. There is no more animagine, meanwhile some famous modellers, or less famous, are gone.

I expect SD community should, or will be forced, to consider merging thousands of LoRAs back to the base model, and keep the "artistic movement" rolling. 將來應該會有人需要大批融 LoRA, 甚至底模, 這總需要有一點辦法吧.

Pseudorandom in extreme condition

In 255c, it looks like the "noise across model weights" is coming back because of extreme normalization across model weights. It resembles vanishing gradient problem in the sense of training trajectory. It marks the end of this series, and I should train the model properly... and I did. 終於融模融到撞牆了.

Conclusion

The new merger gives me ability to carry on the research on fancy merging algorithms on large number of models, and keeping the model structure same and convenient. I may update this article when I successfully produce and test models based from different merging algorithms.

Appendix

See Experiment session for recipes.

My Workstation for this mix (merge time is 36.2 hours, peak usage of RAM for DGMLA-216 is ~~1.446~~3.500TB, scale with total of model counts).

License: Fair AI Public License 1.0-SD
For more details, please see the license section of ANIMAGINE XL 3.0 / Pony Diffusion V6 XL / NoobAI-XL

Hint: It is impossible to proof that you have merged / finetuned my model even it is (1-h) similar. Remember use toolkit to fix VAE ~~(metadata is not preserved)~~
Hint: My scripts are MIT. ~~I'm happy to sell you my workstation.~~

AstolfoMix-XL