Type | |
Stats | 117 |
Reviews | (12) |
Published | Jun 8, 2024 |
Base Model | |
Usage Tips | Clip Skip: 2 |
Hash | AutoV2 BC747CAFD1 |
AstolfoMix-XL
DGMLA-216 merge (Drop w/ GeoMedian and LA) of 216 discovered SDXL models (unfiltered). See this article for description. Go to HuggingFace model page for sneek peek before "kind of official release". Contents / theories will not repeated with the SD1 version or SD2 version, contents below are exclusive for SDXL. Full docuementation / materials are in Github.
CFG++ / PAG / SEG OP. 這個 UC 力, 別說天花板了, 這是宇宙邊界吧?
For CivitAI users: CFG 4.5, CLIP skip 2, default VAE.
No quality tags are required, but excessive tags will not break either. 不用加質量詞. 加了也不會壞.
DGMLA is released in a rush. 我才不知道費馬點跟幾何中心看上去會有啥改善.
Abstract
I present AstolfoMix-XL, a merge model focusing on "exploring merging options", while being stuck in merging the popular model and waiting for SD3. Currently it is in anime style. Welcome back, Astolfo you are so cute!
Introduction
AstolfoMix-XL is the "scale up" merge from my previous models (SD1 / SD2), and hopefully find more useful ideas to expand my article, and reuse the valueable contents (and concepts) in the internet. The haystack is too large for me to merge by hand. With a dedicated merger (like mergekit) doing the procedure in auto without saving intermediate models, I expect it can produce more general but high quality contents out of trained materials.
Related Works
This is tough. 這條學術路線究竟是復興, 還是倒退?
Model merging is an extented topic on ensemble learning, there are quite a bit of merging algorithms for AI models, and it is being discussed formally because LLM models are generally huge and too expensive to train. Many merging algorithms are proposed in paper's own code repo (DARE), or being forggen (MergeMany in Git Rebasin), or being undocuemented ("rotate" with more "matrix operation"), meanwhile dedicated merger (mergekit, or supermerger) is being devloped.
With a slight belief (MDP=AR(1) i.e. LLM merge to SD merge is viable), quite a bit of experiments (SD1 as ModelSoup, SD2 as model selection without alignment, then PR, PR, and PR because there is no codebase in the wild), and some luck, I am brave enough to "get it done" and publish it. It is very hard to with little to none community support (especially most model merger are disappeared after NAI v3, or the finetune hype, or realized that MBW theory is actually invalid it works because of introducing parameters with optimization loop only, which is not artistic at all).
For example, from the released recipe of AnythingXL, I can interprept it as "average of 7 models, with 14.2857% of each model, meanwhile taking the favourite model as double i.e. 28.5714%". Meanwhile, PonyMagine is successfully apply DARE on top of a custom recipe.
Methodology
融模可以不玄, 只是門檻有點高 (魔改算式) 罷了.
I will make a seperated discussion here, or an article in this platform, or see my article in Github (this also), or seperated article in CivitAI if it is not written yet.
Since model merging in SD at this level is lack of discussion, I have nothing to reference but carefully make justification and make insight. From the study with "related works" (I read for multiple times because it is close to ML / Math discussion), I expect that algorithm modification will be essential.
Therefore I first get the original implementation done (took months), and finally perform analysis on the inherited mathmatical properties. Soon I found that task vector should be normalized (subset of rescale), meanwhile sign election should be based form identity instad of signed movement, because MDP under SD is suffering gradient problems like RNN does.
Meanwhile, I don't have resources to either train model (hardware / time / human resources, or interest), or even conduct a throughful evaluation (like team lycoris and deepghs) of a model. All I can do is conduct a subjective HTP Test towards the model, assuming it projects its natural behaviour. Therefore you will see "a pink hair boy interacting with a car, with random but filled backgound".
Experiements
Parameter Searching
Parameters for merging algorithms are found by randomly selecting 10% models out of the model pools (e.g. 20 out of 192) for viewing the effect. It requires much 20x less time to merge, and still able to represent the estimated final result.
Prompts
Even no prompts works. 質量詞真的需要嗎?
I have tested it will long prompts, it works fine. In contrast, you will see most of my posted images are just a few words, and no negative prompt (because I seldom need to exclude things). However, when I add quality tags, it may produce worse or even broken images, becuase the recipe models are fighting each other with contradicted knowledge.
CFG / STEPS / Additives
It is as wide as the SD1 version. Currently I find that "CFG 3.0 + PAG 1.0 + mimic 1.0 phi 0.3 + FreeU Default" will be good. "48 steps of Euler" is enough for generation, however I still favor 256 STEPS + 64 Highres.
Discussion
Even SOTA merging algorithms cannot learn all concepts from all models, it shouldn't be a replacement of Trained Models / LoRAs, more like a base model for further developement. 更好的底模都因為社群的誤解, 或者短視, 被忽略了. 從 SD1 / NAIv1走來, 大家學會了甚麼?
It is a pity, or "final nail to the coffin", that Pony is accepted just because of its NSFW ability, overriding any technical consideration, making resources not sustainable.
I am aware of low attention throughout the journey (that is huge improvement since Baseline Model, the image quality is increased, with less "halo effect"), but I must get it done to leave a mark in the (art) history. I know no one will be interested on developing open source model, because the incentive is way too low, no matter material or just spiritual support. There is no more animagine, meanwhile some famous modellers, or less famous, are gone.
I expect SD community should, or will be forced, to consider merging thousands of LoRAs back to the base model, and keep the "artistic movement" rolling. 將來應該會有人需要大批融 LoRA, 甚至底模, 這總需要有一點辦法吧.
Conclusion
The new merger gives me ability to carry on the research on fancy merging algorithms on large number of models, and keeping the model structure same and convenient. I may update this article when I successfully produce and test models based from different merging algorithms.
Appendix
See Experiment session for recipes.
My Workstation for this mix (merge time is 36.2 hours, peak usage of RAM for DGMLA-216 is
1.4463.500TB, scale with total of model counts).
License: Fair AI Public License 1.0-SD
For more details, please see the license section of ANIMAGINE XL 3.0 / Pony Diffusion V6 XL