Content Advisory: Length
Greetings dear reader, I'd like you to do me a favor. Just flick that scroll wheel on your mouse, and take a look at the size of the scroll bar. See how little it is? See how little it moves? Yep, this guide is long. Around 25 pages, single spaced. This is literally the longest document Ive ever written in my life.
Here is the general outline of the guide:
Intro & Setup - Assumptions and what you need if you want to try to follow along.
Strategy & Prompts - Methods for validating if a merge is good or not.
Approaches 1, 2, & 3 - We examine some basic approaches.
Foundational Knowledge - This is the knowledge needed to fix why Approaches 1, 2 and 3 aren't great.
Approach 4 - In my opinion, the best way to merge models.
Conclusion - Closing thoughs.
Yeah, I'm not reading any of this
Yeah, this is valid criticism. I have tried to use a conversational style as to not sound totally dry and boring. I will also try to give each section will have a summary called Management Advice: at the end. However, a huge part of this guide's length comes from images. If I had to guess, I'd say its 60% images, and 40% text. I actually recommend increasing the zoom a little bit in your browser so this guide isn't quite so dense.
Content Advisory: Lewd Imagery & Nudity
This guide contains female nudity in the form of a Validation Prompt and its resulting images after the merges. This guide does not contain any sexually explicit activity. The near entirety of the nudity would be rated PG-13 like the scene from The Titanic where Rose tells Jack "Draw me like one of your French girls."
The simple truth is that when creating a merge, it is valuable to test the model against a wide range of predefined validation prompts. One of mine, includes nudity. In fact, it's one of my most valuable and reliable validation prompt, and that's why it's included in this guide. However, it is my intention to avoid using that prompt gratuitously.
This guide also contains lewd imagery, but I assume you are probably ok with that.
If you will be an adult about it, then so will I.
Text Color Guide
I'm trying something new with how I write guides: color coating. Let me know if you like it.
Grey Paragraph Text = This is the meat of the guide.
Green Paragraph Text = This is important, but its not critical.
Orange Paragraph Text = This is important, and you should read it.
Red Paragraph Text = This is really important, you should really read it.
Teal Paragraph Text = Dialog
Purple Paragraph Text = My opinions.
Who is this guide for?
If you ever wished a model existed that fit your style, or wished you could change something about a model you use a lot, but just don't know how to make it a reality, this guide is for you.
Merging models is both much easier, and harder than you think. Making generic merges is trivial. Making something excellent is hard. But it's a lot harder if you don't have any knowledge regarding how the tools work, and learning the tools is the difficult part.
This guide assumes:
You know your way around the Stable Diffusion Webui.
You have quite a bit of experience generating images, and are at least somewhat familiar with the process of Stable Diffusion.
You are working with SD1.5 as its just easier to work with, but the ideas should all still be similar for SDXL.
You have a computer with enough CPU power, GPU power, and disk space to be able to perform repeated merges, and in a reasonable duration.
You are familiar with the styles.csv file.
You know how to make an X/Y/Z Plot.
You have probably poked around in the Stable Diffusion Webui menus and seen a tab called the "Checkpoint Merger".
This is the default checkpoint merger tool that comes with A1111, and well...its pretty basic. It can only perform generic blends which is fine, but it doesn't offer any control above and beyond that. We will use this tab only for Approach 1.
Super Merger Extension
This guide instead will be focused on learning how to use the Super Merger Extension, as it offers a much higher degree of control over how, and more importantly what, we merge into our models.
Now, at first glance, this looks like one intense extension, and to be honest, it is. But remember how daunting the Stable Diffusion Webui first felt when you loaded it up for the first time? Well, its probably a lot easier for you now, and it's the same story with this extension. It is a complicated extension, no doubt, but it will become more and more familiar as you use it.
While there is a lot of stuff here, none of it is particularly difficult to learn, and if I am being completely transparent, you probably wont need half of it.
You can install the Super Merger extension like any other extension in the extensions tab in A1111.
Management Advice: Install the Super Merger extension. Its really good.
Basics of Model Merging
Ok, RestlessDiffusion, you keep talking about "merging models", so what does it actually mean to merge a model?
Well, if you remember back to your high school classes, you might remember a little class called "Linear Algebra". Now, I know what you're thinking...
Wait I don't remember anything about linear algebra, matrix manipulation, or any of that other nonsense. Are you seriously going to torture me with high school math?
And the answer is: Of course not! I don't remember any of that either!
But look, here's the deal:
Stable Diffusion models are just a bunch of fancy rows and columns, consisting of a bunch of numbers. I.e. a matrix. These numbers are called the weights. Merging models is simply just running calculations on these weights.
Don't worry, you don't have to do any of the actual math yourself, as that is what the Super Merger extension does, but you will have to adjust how much of model A, and model B should go into the equations, but again, this is much easier than it sounds. The hard part is being able to understand what the math is going to produce, so you aren't wasting your own time making merges that you don't actually want.
Management Advice: Ain't nobody got time for Linear Algebra. We just need to learn how to tell Super Merger what we want it to do.
Merge Validation Strategy
Apples to Apples Comparison
How will you know if your merge is successful? I don't mean this from the literal point of view of if the extension was able to complete the merge. I mean how will you know if your merge captures the style you are going for? Merging models is a process of trial and error, with fine adjustments. With this in mind, the need for an unbiased way to judge whether a merge is successful at rendering your desired style, is of upmost importance. Therefore, we should make a selection of prompts before hand, that target this style and overall aesthetic of the merge desired.
Now, I'm going to let you in on a little secret. I like to generate lewds (yeah big surprise on Civitai, I know). However, since that is the content I enjoy, that's also going to be what my prompts target as criteria for success. You should figure out what kind of content you like, and craft validation prompts to act as the measurement for success.
Here are some of the prompts I use when judging a merge. All these images were generated with sxzLuma because its generally a really great model that crushes prompts. The seed is random on all of these example images, but it is not random in the copy-paste section just below. I did this because I want to show you what I think a successful image looks like.
These prompts do a really fast and dirty upscale that doesn't add much quality, but it does iron out those non-upscaled blemishes from default Stable Diffusion. Feel free to increase the upscale power if your computer can handle it. However, for this guide I will provide a cleaner, more upscaled version of the validation image for the sake of this guide's quality.
These prompts have no quality control words on them like "masterpiece", "low quality", or Textual Inversions like bad-hands-5. This is because I want to be able to switch those out depending on what type of merge I am going for. I suggest you use your own quality control scheme's when you are trying to create your merges. I used my own quality control words though for each image in this guide. I have also included those quality control word blocks after the prompts.
These are copy-pastable into A1111. Just paste each block into the prompt box, and then click the little button underneath the Generate button that is a blue arrow pointing down and left. This will apply that prompt and its settings to the relevant settings sliders in A1111. These also assume you have the 4x-UltraSharp upscaler installed.
Prompt #1: Corporate Executive
Fun fact, this prompt is also the banner image for this guide (at the very top), but the banner image uses a different model. It tests how the merge does with complicated structures like many buildings, and scene composition. It also tests if the merge can refrain from generating NSFW content, even when its prodded to generate something risque. Some merges really struggle with that last point, depending on what you're merging.
This prompt is used as a validation only in this guide. Prompt #4 is the primary working prompt.
1girl, alluring, revealing, hourglass figure, (soft smile:1.2), blonde, hair up in bun, (corporate executive:1.3), (navy blazer:1.2), (pencil skirt:1.1), (silk blouse:1.1), detailed background, balcony, city night life, (water front, river:1.2), bridge, night time, modern city skyline, (facing forward:1.2), (panorama:1.3), skyscrapers, glass buildings, bustling city, (looking at viewer:1.4), standing, (arms behind back:1.3), (cleavage:1.3) Negative prompt: nude, naked, nsfw, nipples, vagina, pussy, topless, bare breasts, (fat:1.25), (sly:1.5), (sleeveless:1.3), (side profile:1.3), (simple background:1.4) Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1000, Size: 512x768, Denoising strength: 0.4, Clip skip: 2, Hires upscale: 1.5, Hires steps: 7, Hires upscaler: 4x-UltraSharp
This is exactly what I am looking for. Nice clean lines regarding the buildings, a slight about of cleavage which is all I asked for. She is blonde with a bun, also with an hourglass figure. Her arms are behind her back meaning the merge can respond to precise pose control, and her attire is as described. Finally the scene also fits perfectly within the prompts boundaries as well.
Prompt #2: Nude on the bed (NSFW)
Let's get this out of the way right now. This prompt isn't just an excuse to include nudity in this guide like you might think (which I don't blame you for thinking). No. This prompt tests how a merge renders fine details regarding humans, which is probably pretty important, seeing as how humans are the most common form of content regarding Stable Diffusion.
Let's also not pretend like nudity isn't literally the most common content on Civitai. This is subtlely important because it means it's the #1 most trained content across all models. If a model fails here, it's going to fail other places.
Like it or not, nudity is a great measurement to judge success. So, if we are going to use nudity, lets at least make it "tasteful" and not vulgar.
So with that out of the way, what should we be looking for? Well, how is the hair shaded? What color is the hair? How do the eyes look? What color are the eyes? Are the arm pits normal? How natural is the pose? Do the breasts seem grounded in reality? How accurately are the nipples rendered? This is actually pretty telling because if the nipples are rendered incorrectly, chances are the merge is going render all kinds of other small details incorrectly. Is the belly button weird like its fleshy or hollow? There is almost nothing else in the prompt besides the human subject, yet many merges (and even base models) can actually struggle with this, as proportions are off, details are off, poses are twisted, etc.
This prompt is used as a validation only in this guide. Prompt #4 is the primary working prompt.
Side Note: A merge that cannot properly render a correct nude form, is going to be shredded on prompt #3.
(cowboy shot:1.4), 1girl, (nude:1.3), (naked:1.3), (nsfw:1.3), hourglass figure, large breasts, busty, (wide hips:0.85), thin waist, (shoulder length hair:1.2), redhead, (hazel eyes:0.75), makeup, mascara, rosy cheeks, smokey eyes, lipstick, (beautiful, gorgeous:1.1), content, relaxed, soft smile, (looking at viewer:1.4), lying down on back, face up, with pillow, arms up, arm pits, hands behind head, sensual, erotic, detailed background, Luxurious bedroom, silk sheets, cosy, inviting Negative prompt: (cowboy:1.3), (medium shot:1.2), (clothing:1.3), (fat:1.25), (ugly, ugly face, average face, imperfect complexion:1.3), (simple background:1.4) Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1000, Size: 512x768, Denoising strength: 0.4, Clip skip: 2, Hires upscale: 1.5, Hires steps: 7, Hires upscaler: 4x-UltraSharp
This is exactly what I am looking for. How does the hair look? Great. What color is the hair? Redhead, just as I prompted (maybe a touch too brown). How do the eyes look? Great. What color are the eyes? Hazel, as prompted. Are the arm pits normal? Yes. Sometimes "over eager" merges will put extra vagina's into arm pits. There is none of that here. How natural is the pose? Natural. Do the breasts seem grounded in reality? Yes. How accurately are the nipples rendered? Accurately. Is the belly button weird like its fleshy or hollow? No, belly button looks normal. How is her vagina rendered? Totally normal. No exaggeration, or weird misshapes (which is definitely a thing that can happen). What is the overall aesthetic of her skin? Its detailed, and textured.
Prompt #3: Playboy Bunny
So it turns out that generating a Traditional Playboy Bunny Costume, like, from the Playboy Mansion, involves some pretty precise prompting, and some merges really struggle with it. Therefore this prompt tests how a merge responds to really specific prompting. Here, sxzLuma CRUSHES it.
This prompt is used as a validation only in this guide. Prompt #4 is the primary working prompt.
1girl, petite, slim waist, small breasts, (shoulder length hair:1.2), ponytail, brunette hair, (black playboy bunny costume:1.3), (satin bodice:1.2), (bandless black nylon pantyhose:1.4), (bunny ears:1.3), (detached collar:1.2), (bow tie:1.2), (french satin wrists cuffs:1.4), white cuffs, (bare shoulders:1.3), (strapless:1.3), (bunny tail:1.6), (high heels ankle strap:1.1), makeup, mascara, rosy cheeks, smokey eyes, lipstick, (beautiful, gorgeous:1.1), flirty, smirk, tease, playful, naughty, (looking at viewer:1.4), sitting, (legs crossed:1.3), finger on lip, detailed background, night, midnight, night sky, luxurious night club, dim mood lighting, accent lighting, high end, upscale, posh, fancy ornate bar Negative prompt: fitness, toned, (large breast:1.3), (wide hips:1.3), (holding cocktail:1.3), (garters:2), (thigh highs:2), (thigh band:2), (thigh strap:2), (stay-ups:2), (hold-ups:2), (stockings:2), (fishnets:2), (tassles:1.3), (intricate leotard:1.1), (latex:1.3), (opera gloves:1.5), (detached sleeves:1.5), (bardot:1.4), (gloves:1.4), lingerie, zippers, (ugly, ugly face, average face, imperfect complexion:1.3), (pussy:1.4), (simple background:1.4) Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1000, Size: 512x768, Denoising strength: 0.4, Clip skip: 2, Hires upscale: 1.5, Hires steps: 7, Hires upscaler: 4x-UltraSharp
SxzLuma gets literally everything correct, with one miss, and one minor flaw.
As I mentioned, we get (nearly) everything. We have the:
Slim body type
White ears with pink inner ears
Brunette hair (no ponytail 😥️)
Make up, lipstick, mascara, smokey eye, rosy cheeks
White detached collar with black bow tie
Finger on lip, with anatomically correct hand.
Black, strapless, one-piece leotard
Black pantyhose that goes all the way up through the leotard.
White french satin cuffs with cuff links
White bunny tail
Luxurious nightclub setting
At night time
The only real flaws are the missing ponytail the stupid thigh bands on the pantyhose that seems to plague pantyhose in every other model, so we'll let it slide. Other than those, this image is nearly flawless. Sadxzero, if you're reading this for some reason (why?, you don't need my help), really well done my friend.
Prompt #4: Low word count
This is my standard working prompt for generating a merge. This is my "known quantity" prompt. The canary in the gold mine, so to speak. If a merge struggles with this, it's completely dead. If a merge aces this prompt but struggles with one of the other prompts, then I know I'm on the right track, just need to tweak some things. There is a small amount of "complexity" in this prompt just to keep the merge honest. For example, the
turquoise forest-green bikini intentionally conflicts the colors, but it also contains the word forest, and I want to know if that will trip up a merge, but generating a forest.
1woman, long auburn hair, hair-part, hazel eyes, turquoise forest-green bikini soft smile, beach, ocean, palm trees Negative prompt: bangs Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1000, Size: 512x768, Denoising strength: 0.4, Clip skip: 2, Hires upscale: 1.5, Hires steps: 7, Hires upscaler: 4x-UltraSharp
This is exactly what I expect. There are literally no flaws.
Prompts formatted for styles.csv
I would recommend adding these prompts to your
styles.csv and then just set and forget the settings required, leaving you free to shuffle prompts around with the click of a button via the Styles dropdown. Here they are as a copy-paste string for you to drop into your
styles.csv. I will use these later in the guide.
👔️ Prompt #1: Corporate Executive,"1girl, alluring, revealing, hourglass figure, (soft smile:1.2), blonde, hair up in bun, (corporate executive:1.3), (navy blazer:1.2), (pencil skirt:1.1), (silk blouse:1.1), detailed background, balcony, city night life, (water front, river:1.2), bridge, night time, modern city skyline, (facing forward:1.2), (panorama:1.3), skyscrapers, glass buildings, bustling city, (looking at viewer:1.4), standing, (arms behind back:1.3), (cleavage:1.3)","nude, naked, nsfw, nipples, vagina, pussy, topless, bare breasts, (fat:1.25), (sly:1.5), (sleeveless:1.3), (side profile:1.3), (simple background:1.4)"
🛌️ Prompt #2: Nude on the bed,"(cowboy shot:1.4), 1girl, (nude:1.3), (naked:1.3), (nsfw:1.3), hourglass figure, large breasts, busty, (wide hips:0.85), thin waist, (shoulder length hair:1.2), redhead, (hazel eyes:0.75), makeup, mascara, rosy cheeks, smokey eyes, lipstick, (beautiful, gorgeous:1.1), content, relaxed, soft smile, (looking at viewer:1.4), lying down on back, face up, with pillow, arms up, arm pits, hands behind head, sensual, erotic, detailed background, luxurious bedroom, silk sheets, cosy, inviting","(cowboy:1.3), (medium shot:1.2), (clothing:1.3), (fat:1.25), (ugly, ugly face, average face, imperfect complexion:1.3), (simple background:1.4)"
🐰️ Prompt #3: Playboy Bunny,"1girl, petite, slim waist, small breasts, (shoulder length hair:1.2), ponytail, brunette hair, (black playboy bunny costume:1.3), (satin bodice:1.2), (bandless black nylon pantyhose:1.4), (bunny ears:1.3), (detached collar:1.2), (bow tie:1.2), (french satin wrists cuffs:1.4), white cuffs, (bare shoulders:1.3), (strapless:1.3), (bunny tail:1.6), (high heels ankle strap:1.1), makeup, mascara, rosy cheeks, smokey eyes, lipstick, (beautiful, gorgeous:1.1), flirty, smirk, tease, playful, naughty, (looking at viewer:1.4), sitting, (legs crossed:1.3), finger on lip, detailed background, night, midnight, night sky, luxurious night club, dim mood lighting, accent lighting, high end, upscale, posh, fancy ornate bar","fitness, toned, (large breast:1.3), (wide hips:1.3), (holding cocktail:1.3), (garters:2), (thigh highs:2), (thigh band:2), (thigh strap:2), (stay-ups:2), (hold-ups:2), (stockings:2), (fishnets:2), (tassels:1.3), (intricate leotard:1.1), (latex:1.3), (opera gloves:1.5), (detached sleeves:1.5), (bardot:1.4), (gloves:1.4), lingerie, zippers, (ugly, ugly face, average face, imperfect complexion:1.3), (pussy:1.4), (simple background:1.4)"
👙️ Prompt #4: Low word count,"1woman, long auburn hair, hair-part, hazel eyes, turquoise forest-green bikini soft smile, beach, ocean, palm trees","bangs"
Quality Control Blocks for styles.csv
Below are the main quality control blocks I use to add quality to my images. RestlessExistence++ is a generic middle ground that kind of works with content that's is not quite anime, but not quite photorealistic. I mainly use it for my own model RestlessExistence, as well as any other "artistic" models that aren't strictly anime or photorealistic.
For the purposes of this guild, I am going to stick to
Generic Neg Text (Low Weight), unless otherwise noted.
Unless you know exactly what kind of merge you're going for, its best not to influence the images that come out of the merge too much with an ton of quality control words. Let the model do most of the talking.
️ Generic Neg Text (Low Weight),,"(worst quality:1.5), (low quality:1.5), (normal quality:1.5), (b&w:1.2), (black and white:1.2)"
✏️ Restless Anime+,"(linework:1.1)","(worst quality, low quality:1.4), (panels:1.4), lowres, low resolution, watermark, bad-hands-5"
💯️💯️💯️ Restless Existence++,"(Maya 3d render:1.05), (masterpiece:1.3), (hires, high resolution:1.3), subsurface scattering, heavy shadow","(low quality:1.5), (normal quality:1.5), (lowres, low resolution:1.5), (FastNegativeV2:0.75), BadDream, (UnrealisticDream:1.25), (bad-hands-5:1.25)"
📷️📷️📷️ Restless Photorealistic++,"RAW photo, (masterpiece:1.3), subsurface scattering, heavy shadow, (high quality:1.4), (intricate, high detail:1.2), professional photography, HDR, High Dynamic Range, realistic, ultra realistic, photorealistic, high resolution, film photography, DSLR, 8k uhd, Fujifilm XT3","(low quality:1.5), (normal quality:1.5), (lowres, low resolution:1.5), (camera:1.5), (FastNegativeV2:0.75), BadDream, (UnrealisticDream:1.25), (bad-hands-5:1.25), underexposed, underexposure, overexposure, overexposed, canvas frame, cartoon, 3d, 3d render, CGI, computer graphics"
Management Advice: An unbiased selection of prompts that target the content you wish your merge to produce, is vital to determining if your merge is successful or not.
Zero-Shots, One-Shots, Two-Shots, Oh my!
What is a Zero-Shot, One-Shot, Two-Shot?
A zero shot in this context actually comes from Large Language Models, like ChatGPT. It's where you give a prompt with no examples, or context and you ask the model to generate a specific output, and it only has one chance to get it right.
For example, if you asked ChatGPT to write a poem about cats, it could do it, no problem (this would be a zero-shot, because you provided no context about cats). However, If you asked ChatGPT to write a poem about YOUR cat, ChatGPT would need more context, if you didn't want a generic poem about generic cats. Lets say you provided one instance of context about your cat. That would be a One-Shot. Two instances of context? That's a two-shot, and so on. This is a little overly simplistic, but its the general idea.
In the context merging Stable Diffusion models, a Zero-Shot is using a single predefined seed, and validating the results of the merge off of that one seed. The model doesn't get a second chance. No adjustments to the prompt. Nothing. A one-shot in this context is using a second seed, a two-shot is using a third seed, etc.
I personally recommend using no more than a Two-Shot, that is to say, three seeds. Personally, I use seeds 1000, 1001, and 1002. But you can use any numbers you like, like 69, 420, and 42069.
Its critically important not to change your prompts trying to bend them around the merge. Have faith in your prompts, and bend the merge so that it fits your prompts.
Management Advice: Zero-Shot = Validation with one seed. One-shot = Validation with two seeds. Two-Shot = Validation with three seeds, and so on. I recommend Two-shot validation. Don't change your prompts. Change the merge.
Define your target style
If you really want success, you need to think clearly about what you want to include in your merge. For example, if you want an anime merge, do you want it to lean more gritty like an 80s anime, or do you want it to be more clean and pristine like something from modern anime? If you you want a photo merge, do you want to lean heavily into realism, or do you want a lot of art/expression?
A successful and interesting merge is unlikely to be a simple blend of a couple of your favorite models. It's better to have a specific goal in mind for the merge, rather then just slapping models together and hoping for the best.
We will see an example of this in greater detail further in the guide, so don't worry about this too much now, but keep those ideas in your head.
Management Advice: Make sure you have at least some idea of the merge your aiming for.
Before we go any further, we need to talk about Model Licenses. Just because you can merge models, doesn't mean you can publish them freely. If you are using someone else's work, you need to make sure you are following the license set forth by that model.
Case in point, epiCPhotoGasm, by epinikion. This is an excellent trained model, meaning that epinikion put in all the hard work themselves. I imagine then, they don't want this model to be merged into other photographic models to potentially undermine their hard work, although this is just my speculation. Whether I am right or not doesn't matter. What matters is the license on that model.
"No sharing merges". You can merge it all day long into your own personal merges, but those merges must stay personal. You cannot publish them.
Before you start thinking about models you want to include in your merge, you need to make sure they are available to be published. If you have no intention of ever publishing the model, then it will remain a personal merge, and that's ok.
Management Advice: Make sure your merges abide by the licenses of the original models.
Merge Approach 1 - Simple Blend & Sample Workflow
This approach is the easiest but also has the least control. It merges two or three models together "wholesale". in this example, we are only going to merge two models. With that, lets head on over to the Checkpoint Merger tab.
I am also going to take this time to show the basics of my merge workflow.
Here is the base setup:
First, I want you to notice these folders/directories in front of my model names. These are folders in my Stable Diffusion models folder that I use to organize my models. Notice how one of them is named "Merging". I strongly suggest you also make a "merging" folder and put all your merges into it. If you don't do this, you will dump a ton of merges into your main model folder and it will cause chaos.
Notice how I am naming the resulting merged model: RealisticVision-MeinaMix-WS-A50. Notice how this name correlates to the parameters used to create the merge. I also strongly suggest that you use some kind of naming strategy like this so you know exactly which merges as what.
Notice that the "Multiplier" is set to 0.5. This slider asks: "What percent of model B should model A absorb?" 0 is 0%. 0.25 is 25%. 0.5 is 50%. 0.75 is 75%. 1 is 100%, and so on.
In this example, we want RealisticVision to "absorb" 50% of MeinaMix. What is actually happening behind the scenes is that linear algebra math, using a multiple of 0.5.
We are also using "Weighted Sum". This is the mechanism we are using to determine how much of each model we want. This is what is using the value set in the Multiplier Slider.
Don't worry too much about how these concepts actually work, they don't actually matter. What does matter is understanding the end result. But first...
...some house keeping. Make sure you save your model as safetensors, and float16. Leave "Copy config from" on the first bubble, and make sure you don't bake in a VAE (yet).
Now, Ill be real with you, I have no idea how long this will take on your computer. On my computer with a Ryzen 5900X, and RTX3070, it took 21 seconds. Either way, once your merge completes, you probably want to know how to use it.
Lets head over to the txt2img tab. Click this little button next to the Stable Diffusion checkpoint dropdown.
If you have never clicked that button, it refreshes the available models list. In fact, that idea is universal throughout the UI. That button just says "hey refresh this list with anything new". When merging models, you are going to find yourself clicking these buttons a lot.
Lets select our shiny new Merging/RealisticVision-MeinaMix-WS-A50 model.
Now, lets fire up our validation prompts and see what we get. Again, I am using the
Generic Neg Text (Low Weight) quality control block, because I don't want the quality control block to interfere with the image generation process, but I also don't want a garbage image.
Here is an example of what it looks like for me:
And with that, lets take a look at our validation prompt images. These are all Seed 1000.
For prompt #1:
Over all, its not bad. It definitely captures a good amount of the prompt requirements. There are some things that are off, such as it isn't quite nighttime, but its close enough. Also, her arms aren't behind her back, but its still an ok image. Quite a lot of cleavage though. This is the maximum allowable cleavage I think I would accept.
For Prompt #2:
Her eyes are a little weird, but other than that, everything else seems anatomically correct except the hands at the top, but I can forgive that. Therefore I think its an ok image. The overall aesthetic is lacking though.
For Prompt #3:
This is mostly meh. We are missing the cuffs, and rabbit tail, and the finger on the lip. We also have these stupid thigh bands again, but other than that, its following what was prompted.
For Prompt #4:
This image is nearly flawless, given what the model had to work with. Only flaw is that the eyes are too green. Also some weird artifacting in the left hand, but the hand looks correct. Not sure what is going on there.
As we can clearly see, the MeinaMix DNA is coming through with these images.
Lets go back and change the merge slider to 0.25 indicating that we want RealisticVision to absorb 25% of MeinaMix. Don't forget to change the name of the new merge model! Merging/RealisticVision-MeinaMix-WS-A25.
Then, lets make another merge, setting the merge slider to 0.75.
We should now have these three models:
Management Advice: For generating simple merges, Checkpoint Merger is fine, but its unlikely to give incredible results, as we will see in the next section.
Validation through X/Y Plots
The utility of X/Y Plots for validating merges cannot be understated. There is simply no better way to validate a merge, so lets learn how to do it.
Remember when I suggested that you add your validation prompts to your styles.csv? Well, now we are going to make a variant of those to include the
Generic Neg Text (Low Weight) quality control words. If you didn't add them to your
styles.csv before, here they are again with the
Generic Neg Text (Low Weight) included.
👔️ Prompt #1 X/Y: Corporate Executive,"1girl, alluring, revealing, hourglass figure, (soft smile:1.2), blonde, hair up in bun, (corporate executive:1.3), (navy blazer:1.2), (pencil skirt:1.1), (silk blouse:1.1), detailed background, balcony, city night life, (water front, river:1.2), bridge, night time, modern city skyline, (facing forward:1.2), (panorama:1.3), skyscrapers, glass buildings, bustling city, (looking at viewer:1.4), standing, (arms behind back:1.3), (cleavage:1.3)","nude, naked, nsfw, nipples, vagina, pussy, topless, bare breasts, (fat:1.25), (sly:1.5), (sleeveless:1.3), (side profile:1.3), (simple background:1.4), (worst quality:1.5), (low quality:1.5), (normal quality:1.5), (b&w:1.2), (black and white:1.2)"
🛌️ Prompt #2 X/Y: Nude on the bed,"(cowboy shot:1.4), 1girl, (nude:1.3), (naked:1.3), (nsfw:1.3), hourglass figure, large breasts, busty, (wide hips:0.85), thin waist, (shoulder length hair:1.2), redhead, (hazel eyes:0.75), makeup, mascara, rosy cheeks, smokey eyes, lipstick, (beautiful, gorgeous:1.1), content, relaxed, soft smile, (looking at viewer:1.4), lying down on back, face up, with pillow, arms up, arm pits, hands behind head, sensual, erotic, detailed background, luxurious bedroom, silk sheets, cosy, inviting","(cowboy:1.3), (medium shot:1.2), (clothing:1.3), (fat:1.25), (ugly, ugly face, average face, imperfect complexion:1.3), (simple background:1.4), (worst quality:1.5), (low quality:1.5), (normal quality:1.5), (b&w:1.2), (black and white:1.2)"
🐰️ Prompt #3 X/Y: Playboy Bunny,"1girl, petite, slim waist, small breasts, (shoulder length hair:1.2), ponytail, brunette hair, (black playboy bunny costume:1.3), (satin bodice:1.2), (bandless black nylon pantyhose:1.4), (bunny ears:1.3), (detached collar:1.2), (bow tie:1.2), (french satin wrists cuffs:1.4), white cuffs, (bare shoulders:1.3), (strapless:1.3), (bunny tail:1.6), (high heels ankle strap:1.1), makeup, mascara, rosy cheeks, smokey eyes, lipstick, (beautiful, gorgeous:1.1), flirty, smirk, tease, playful, naughty, (looking at viewer:1.4), sitting, (legs crossed:1.3), finger on lip, detailed background, night, midnight, night sky, luxurious night club, dim mood lighting, accent lighting, high end, upscale, posh, fancy ornate bar","fitness, toned, (large breast:1.3), (wide hips:1.3), (holding cocktail:1.3), (garters:2), (thigh highs:2), (thigh band:2), (thigh strap:2), (stay-ups:2), (hold-ups:2), (stockings:2), (fishnets:2), (tassles:1.3), (intricate leotard:1.1), (latex:1.3), (opera gloves:1.5), (detached sleeves:1.5), (bardot:1.4), (gloves:1.4), lingerie, zippers, (ugly, ugly face, average face, imperfect complexion:1.3), (pussy:1.4), (simple background:1.4), (worst quality:1.5), (low quality:1.5), (normal quality:1.5), (b&w:1.2), (black and white:1.2)"
👙️ Prompt #4 X/Y: Low word count,"1woman, long auburn hair, hair-part, hazel eyes, turquoise forest-green bikini soft smile, beach, ocean, palm trees","bangs, (worst quality:1.5), (low quality:1.5), (normal quality:1.5), (b&w:1.2), (black and white:1.2)"
Now lets go back to the txt2img tab, and scroll to the bottom.
From the Scripts dropdown, select "X/Y/Z plot", and make yours look like mine. If you are using your own prompts, you can use them here.
Then scroll back to the top.
Make sure to put Hires steps at something small like 8, and Upscale by at like 1.5. Otherwise your Plot will take forever.
On my computer this took around 10 minutes. This was also an exhaustive plot. Normally we wont do this. We will target the specific parameters we are interested in so we don't waste time generating a ton of useless images in the plot.
Seeing the image breakdown like this is incredibly valuable when you are searching for the right parameters you need to use to make a merge. It also helps to identify problems that you might otherwise realize. For example, RealisticVision, seems to have issues with our prompts. Therefore, maybe a different photographic model would be more useful?
The painful truth, is that this plot gives us all the knowledge we need about how these two models can be merged together. This is it. There's nothing else. If your desired merge is somewhere between these merges, then great, but chances are, that its not. This is why using a simple blend Weighted Sum in the Checkpoint Merger is unlikely to give you a great merge.
Management Advice: The value of X/Y/Z plots are immense. Use them to map out the different parameters you're considering for your merges.
Clearly Define Criteria For Success
What's in a style?
I have a confession, I don't like the vast majority of anime models on Civitai. Their styles were commonly rooted in the "minimalist" style of anime where things like mouths and noses are drawn by a single line. If you have no idea what I am talking about, many of the images from CounterfitV3 showcase this. Now, I am not saying that CounterfitV3 is a bad model! Not at all! Its an amazing model.
I personally just don't find the style appealing.
You know what styles I do like? (Seed 1002)
touch of realism
heavy line weight
clean image aesthetic
rich warm colors
heavy line weight
rich warm colors
heavy line weight
touch of realism
These aren't the only anime models I enjoy, but I'm not about to make an exhaustive list.
To summarize what I like the most:
heavy line weight
a touch of realism
rich warm colors
As a matter of expediency, this guide will use the style guideline I just outlined above, as the aesthetic we are aiming for with our merges, for the rest of the guide.
Management Advice: Clearly define what your Acceptance Criteria is for a successful merge.
Lets take a look at Super Merger and just take it all in for a moment.
This looks way more intimidating than it actually is. First and foremost, its essentially the same exact thing as Checkpoint Merger, but with more bells and whistles.
Lets start at the top and work our way down.
First, we see that we have two big orange buttons, "Load settings from:", and "Clear Cache". The first one is pretty self explanatory. You can save your settings to a file, and then load them again. The other button Clears the Cache. What does that mean?
Look at the thing in the middle called "merged model ID". This is Super Merger's party piece. See, normally, when you make a merge with Checkpoint Merger, you have to literally save it somewhere, and then manually select it again. With Super Merger, it "saves" the last merge you made as an "anonymous merge" allowing you to generate images with it, but instantly overwrite the model as you dial in your desired settings. Best to leave this value at -1 for now.
The Clear the Cache button deletes this anonymous model.
Hey look! Its our friend Weighted Sum from Checkpoint Merger. This does the same exact thing that it did in Checkpoint Merger, only now it has access to way more calculation methods.
Weighted Sum only works with Model A and Model B. It totally ignores whatever you have selected in Model C. We already covered what Weighted Sum does in Approach 1, so we wont cover it again here.
Wait, Model C? Didn't Checkpoint Merger also have a Model C? Yes it sure did, but I didn't want to dive into that while we were taking our first baby steps.
Model C allows us to merge three models together in the same step. Sometimes this is desirable, sometimes not. Its not good or bad, its just another tool in the toolbox.
"Add difference", "Triple sum", and "sum Twice" use all three models for their calculations, meaning if you have not selected anything in Model C, Super Merger will yell at you.
Weighted Sum: Simply add the two model weights together. That's it.
Here is an example using MistoonSapphire, and StingerMix. We can clearly see both model's influences. However, we lost a big part of MistoonSapphire, which is the nice dark lines.
Add Difference: Take all the weight values of Model B, and subtract the weight values from Model C. Then, whatever the resulting weight values are, and add those to Model A's weights. But before you add those weights to A, adjust how strong they are by Alpha (the slider 0.0 to 1). This allows you to attempt to limit the merge to just the differences between two models. In other words, how much of this "Difference" should model A absorb?
As an example, lets go back to where I said I liked StingerMix, MistoonSapphire, and GalenaREDUX. Well, lets say I only wanted the difference between Mistoon and Galena. Since Mistoon and Galena both have dark lines, the difference would be the how Mistoon and Galena look in terms of image texture. But if you look at the image below, and compare it to the Weighted Sum image above, you will notice that the image below is missing the texture I am referring to, while the Weighted Sum possesses it.
Here's my problem with Add difference. I just don't find it that useful. The only time you're going to get a difference that is really noticeable is when you are merging to wildly different models, but that just leads to instability. Most of the time, you're merging similar models, so Add difference just isn't impactful enough. In fact, here is the resulting image using an Alpha of 0.5:
stingermix_v40 + (mistoonSapphire_v20 - galenaREDUX_v20) * 0.5
As you can see, very little has changed from Weighted Sum. And not just that, what did change isn't easily predictable either. Add difference can be useful if you use model A as the base model, model B as the target model, and model C is also the same model as A. But the results are very similar to if you did a generic Weighted Sum merge.
Triple Sum: This is just a straight sum of all the model's weights. You aren't choosing "how much of model B should model A absorb". Nope, this is a straight sum composite. The Alpha and Beta sliders control how much of Model B and Model C you want, at the expense of model A. An Alpha and Beta set to 0.33 each will give equal weight to all three models. An Alpha set to 0.4 means I want 40% of model B, and a Beta of 0.4 means I want 40% of C, and the rest is A which is 20%.
Here is that Triple Sum merge:
stingermix_v40 x 0.2 + mistoonSapphire_v20 x 0.4 + galenaREDUX_v20 x 0.4
This is obviously a great merge, and a clear step in the right direction of our desired aesthetic, but its lacking the touch of realism I want. It's also missing a "clarity of image". I want the style to be striking, not cluttered. Well, with only three model spaces to work with, I am out of luck (at least until Approach 2).
Triple Sum's biggest flaw is ff you are merging three similar models together, they kinda just blur together. There isn't a whole lot of fine tuned control, unless you're sliding Alpha and Beta to the extremes, but then you run into the problem where your merge is 90% model X. But that also leads to some incredibly unpredictable results.
Here is a Triple Sum where I set Alpha to 0.7 and Beta to 0.15. So its 70% Mistoon, 15% Stinger, and 15% Galena.
As you can see, its very nearly the same, as the previous.
But here is another merge where I swapped out Galena for RealisticVision just to get a touch of that realism I wanted.
Here Alpha is 0.7 and Beta is 0.15. The calculation looks like this:
stingermix_v40 x 0.15 + mistoonSapphire_v20 x 0.7 + realisticVisionV51_v51VAE x 0.15
So its 15% Stinger, 70% Mistoon, and 15% RealisticVision.
Remember, this is 70% Mistoon! 70%! Does this look like 70% Mistoon to you?
Sum Twice: If you're going to merge three models together, and you want some degree of control, this is the method I recommend. Sum Twice is basically two iterations of Weight Sum. The first iteration is "How much of Model B should Model A absorb"? Then, how much of Model C should the new Model A absorb?
If you set Alpha to 0.5 and Beta to 0.33, you can think of that as 1/3, 1/3, and 1/3. This is an oversimplification, but its mostly accurate. Don't worry if that math doesn't make sense to you. I genuinely suggest you play with the Alpha and Beta sliders, as see for yourself. Worst case scenario is you make a bad merge and have to delete it later.
Going off of the previous example, if I set Alpha to 0.5 and Beta to 0.33, then I will get a merge that is very similar to Triple Sum, which I'm not going to show here, as its nearly identical.
Instead, here is a merge where I set Alpha to 0.5 and Beta to 0.1. In other words, I want StingerMix to absorb 50% of mistoonSapphire, and then I want the result of that first merge to absorb 10% of Galena.
(stingermix_v40 x 0.5 + mistoonSapphire_v20 x 0.5) x 0.9 + galenaREDUX_v20 x 0.1
As you can see, we lose a lot of the Galena DNA here, which is expected as we only took 10% of it. If we swing Beta all the way to 0.9, meaning we take 90% of Galena, you get this:
(stingermix_v40 x 0.5 + mistoonSapphire_v20 x 0.5) x 0.1 + galenaREDUX_v20 x 0.9
This merge is clearly 90% Galena.
As we can see, Sum Twice has a much higher degree of control.
These are the actual mathematical functions that are used to calculate the weights of each merge. I'm the interest of not putting you to sleep, I'm going to go over these rapid fire giving you the general, yet oversimplified idea, of each one. If you really want to deep dive into these, I suggest you read this from the Super Merger extension creator.
normal: Literal summation. A+B+C. Useful for simple blends when you want to "smear" to models together.
cosineA: Uses something called cosine approximation. This is a fancy way of saying it uses the trigonometry function cosine, to help minimize loss when merging weights. The A means it favors reducing loss relative to model A. This function is "opinionated" on how the merge should be done. Exclusive to Weight Sum so it ignores Model C and Beta.
cosineB: Same as Cosine A, but favors model B.
trainDifference: Takes the difference between model B and C and pretends that its a LoRA. It takes this LoRA, and then "trains" it into A. This allows for a much more nuanced merge, where fine details can be retained. Exclusive to Add Difference so it ignores Beta. Alpha determines how strong the "LoRA" is. Oversimplified but close enough.
smoothAdd/MT: Not recommended. Uses a Gaussian filter or Median filter to help smooth out problems that can arise from Add Difference. Not useful because it is horrifically slow. It is orders of magnitude slower than anything else here. There are better ways. To give perspective, a normal merge with Weighted Sum/Add Difference/etc takes my computer maybe 10 to 15 seconds. A merge with smoothAdd takes over 7 minutes. Also, the results are always garbage. Stay way. I legitimately think either I am missing something or this is broken.
tensor/2: There is no easy why to explain this one. Its just an alternative weight sum. It does the sum on the actual tensor instead of the weights. Don't worry if you dont understand that. Just remember its a special alternative "weight sum".
self: I cannot get this to work myself, and the results are always dead, so I guess its more trouble than its worth.
Now I know what you're thinking. Ok so which ones should I use?
Normal, cosineA, and trainDifference are all useful. I personally wouldn't bother with the others.
Use Normal when you want to blend models together.
Use cosineA when you want to blend models together in a more "precise and sterile" way which can be useful for isolating a model's nuances.
Use trainDifference when you for sure want to keep model nuances.
There are some other things to cover with Super Merger, but we will get to those later. For now, lets...
Merge Approach 2 - Pyramid Summation
Well, of those models listed just above had all those qualities, why don't we try a merge of all of them? Only question is...how? Both Checkpoint Merger and Super Merger only have three slots for models.
Enter the "pyramid" strategy. Now, this strategy is probably pretty obvious but lets clearly go over it.
Model A + Model B = Model 1
Model C + Model D = Model 2
Model E + Model F = Model 3
Model 1 + Model 2 + Model 3 = Model 4
In other words, the merge order looks like a pyramid (sorry mobile users):
Model 4 || Model 1 + Model 2 + Model 3 // || \\ (Model A + Model B), (Model C + Model D) , (Model E + Model F)
You merge base models into pairs (or triplets, however you want to do it), then you merge the pairs together until you have no more pairs, you are left with one model at the end.
Lets see it in action.
But first we need to do a tiny bit of setup.
If you expand this section you see this:
"If blank or set to 0, parameters in the "txt2img" tab are used.
batch size, restore face, hires fix settings must be set here"
This text basically says "Ill just use whatever you have setup in txt2img". It clearly says that hires fix needs to be set in SuperMerger, but this is outdated, and not the case anymore.
Therefore, lets go back to the txt2img tab and make sure we have Prompt #2: Low Word Count and Generic Neg Text (Low Weight) selected in the styles box.
Make sure the Seed is set to 1000, or whatever your go-to seed is.
Make sure your Hires. fix is enabled, and is set to something comfortable and fast for your PC.
I recommend Hires Steps = 8, and Upscale by = 1.7. Here is what it looks like for me.
And with that out of the way, lets go back to the Super Merger tab and get to work.
Alright, well, why don't we just try a straight up merge of all those five models in order?
So lets see, we have StingerMix, MistoonSapphire, GalenaREDUX, CamellaLine, and GhostMix.
But wait, which models should we merge into which? Lets try models that share similarities.
So that's probably (StingerMix + GhostMix) and we could probably do a triplet of (Mistoon + Galena + Camella), and then merge the final result.
I have no idea what to set Alpha and Beta to, so lets just keep them at Alpha = 0.5, and Beta = 0.33 so all models have equal representation in the final merge.
Up first, is (StingerMix + GhostMix) by Weighted Sum.
Make sure these boxes are checked:
Make sure you name your model Merging/stingerGhostWSA50 or use whatever naming strategy you like, but make sure the name makes sense.
And we get his image:
stingermix_v40 x 0.5 + ghostmix_v20Bakedvae x 0.5
This image has a clean aesthetic, non minimalism, and rich shadows.
Lets do Mistoon + Camella + Galena next.
Remember, this is a triplet, so we want to use Merge Mode "sum Twice", with Alpha is 0.5 and Beta is 0.33. If my UI looks like it shuffled a bit, its because I have to resize it constantly to get a good screenshot that fits in this article.
Keep all the save settings the same but name it Merging/mistoonCamilliaGalenaSTA50B33
Click Merge & Gen, and we get this image:
(mistoonSapphire_v20 x 0.5 +camelliamixLine_v2 x 0.5) x 0.67 + galenaREDUX_v20 x 0.33
This merge has the heavy linework, and rich shadows. It doesn't have warm colors, or what I would consider a clean aesthetic.
Lets see what the final merge gives us.
I personally use dashes to separate merges in a filename. It looks like this:
And we get this image:
I mean, its definitely a merge of the two, but its not quite what I am looking for.
Lets see what the range is for merging these two models using an X/Y plot.
Scroll down a bit and open the XYZ Plot menu:
From there, make sure the X type is set to alpha, and the Sequential Merge Parameters are set to
And then click the button just below called "Sequential XY Merge and Generation".
This will take some time, but not nearly the same amount of time it took us in Approach 1.
Here is the resulting Grid:
In my opinion between 0.75 and 0.9 looks the best, but this merge isnt particularly interesting to me. It reminds me of a muted, worse version of Galena. I mean its kinda good, but its missing that something special that makes a model pop.
Here is the full X/Y/Z Validation Prompt for Merging/stingerGalenaWSA50-mistoonCamelliaGalenaSTA50B33-WSA50
So, by now you might be thinking "Well, why don't I just X/Y plot the first couple merges we did and see what I can get there? Maybe that will have some impact?". That's an excellent intuition, but the problem runs deeper than that. Doing a merge in this fashion is just going to cause all these models to blur and smear together almost no matter how you end up merging them. It's the same problem as Triple Sum. Sure you can try to twist the Alpha and Beta around and you might be able to eek out some more quality, but ultimately its not likely to work like you intend. However, if the style you want is literally just a massive blend of a ton of models, this is the approach for it.
Management Advice: Merging models via Pyramid Summation is only useful if you are quite literally just trying to blend a bunch of models together.
Merge Approach 3 - Fold In Method
The Fold In Method is pretty straight forward. Its where you add one model to the merge at a time, precisely controlling how much of that model you want. Ultimately its the same thing as Pyramid Summation, but its more incremental in its approach, offering more control.
Now that we know how to use the basics Super Merger, I'm not going to continue with a ton of screen shots, and instead, just walk you through what to change, and only use screenshots for key pieces. Its the exact same stuff as Approach 2 - Pyramid Summation.
Lets get started.
So once again, lets think about those five original models. They were: StingerMix, MistoonSapphire, GalenaREDUX, CamellaLine, and GhostMix.
If I am going to add one model at a time, then I need to pick one model to act as a "base".
Out of all of those models I listed, I like MistoonSapphire the most, so lets choose that.
In the Super Merger tab, set Model A to MistoonSapphire.
Alright, so which model is first up on the chopping block? Well, lets try a zipper approach. Since StingerMix and GhostMix have more realism than Galena and Camella, lets fold them in, one from each category. Specifically:
MistoonSapphire + StingerMix + Galena + GhostMix + Camella, in that order.
By the way, Pyramid Summation, and Fold In Method aren't some official workflows. They are my own work flows that I made up. You can make any changes you want to these workflows.
The reason why the Fold In method is valuable, is because you can perform XYZ plots quickly because its always just two models. Just pick the best result, and go again.
Lets see it in action.
Lets do a XYZ Plot between MistoonSapphire and StingerMix. Don't forget to set model B to StingerMix.
To me, it looks like in between 0.1 and 0.25 is the best first step. Any further into a higher Alpha, and we start to really loose the MistoonSapphire identity. Also, the shadows look best on 0.25.
Lets save a weighted sum merge using an Alpha of 0.2, indicating we want Mistoon to absorb 20% of Stinger.
Name it as Merging/mistoonStingerWSA20. Lets do a Merge & Gen to make sure that the resulting image is what we expect.
We lost a significant chuck of the linework that Mistoon had originally, so lets see if we can get some of that back with the next merge.
Change Model A to be your newly created Merging/mistoonStingerWSA20, and change model B to be Galena, and simply just scroll down and click Sequential XY Merge and Generation. All the parameters should be the same as last time.
Honestly, 0.5 seems like its best next step. We still retain clean clean aesthetic from Mistoon and Stinger while getting some of Mistoon's original identity back, but now we have our rich shadows.
Lets save this as Merging/mistoonStingerWSA20-galenaWSA50, and do another Merge & Gen, to make sure resulting image is what we want.
Look like we are trending in the right direction. Lets continue.
By now, you know the drill, set Model A to our newly created Merging/mistoonStingerWSA20-galenaWSA50, and model B to the next model on the list, GhostMix, and hit that Squential XY Merge and Generation again.
Man, we lose a lot of the DNA we just built up after 0.25. Lets play it safe and just take 15% of GhostMix. Save it as Merging/mistoonStingerWSA20-galenaWSA50-ghostWSA15, and Merge & Gen.
We took a hit to the linework again, but thats ok because we have CamellaLine up last. Also, the aesthetic starting to "flatten out" into a generic anime merge, as we are starting to run into the same problem as Approach 2 and Triple Sum before. What I mean by this is that these models are once again starting to smear together. At least now, its easy to predict how they will smear.
Lets merge the last model Camella.
Well, if we want to get our strong linework back, our merge is going to be 30% to 50% of Camella.
Lets save the merge at 33%, and see what we get. Name it Merging/mistoonStingerWSA20-galenaWSA50-ghostWSA15-camellaWSA33. Merge & Gen.
This is our final product. I mean, its an Ok merge, but its not what I envisioned in my mind.
Here is the XYZ Plot:
Management Advice: The Fold In method is essentially the same thing as Pyramid Summation, but its incremental focus means you have a greater degree of control for how you merge base models in.
Now, I know what you're thinking...
"Ok then RestlessDiffusion, stop torturing me with these failed attempts, and tell me what you envisioned in your mind before I close this tab..."
This is what I envisioned:
Great linework. Gorgeous clean aesthetic. Rich Shadows. Absolutely no minimalism. Just a touch of realism. And wonderful warm colors.
This is RestlessAnimance.
"RestlessDiffusion, is this entire guide one massive plug for your stupid anime model?"
Yes, yes it is.
Now let me show you how I created it.
But first you need to understand Merge Block Weights.
Management Advice: You should use RestlessAnimance. Its really good.
Merge Block Weights
Foundational knowledge you just gotta know
This is the hardest thing to grasp about merging models, but it's also critically important to understand if you want to create a successful merge in the next section, Approach 4 - Targeted Composite.
All of what you're about to read can be found in this excellent in-depth guide on the topic and I strongly suggest you read it. However, I'm going to attempt to summarize it in this guide.
Foundation #1: Each Stable Diffusion model has eleven (11) input layers, a single (1) middle layer, and eleven (11) output layers.
>>> Foundation #2: You can intentionally target merge specific layers, with specific functions. I cannot over-stress how critically important this is. <<<
Foundation #3: The outer layers (not output layers) are focused on the fine details of the image. How are the tiny lines drawn of this persons eye lashes? How is the hair shaded? How deep are the shadows? What does the subsurface scattering look like? How do the textures reflect light?
Foundation #4: The middle layers are focused on specific concepts. How should the model draw this arm? This dress? This room interior? Is it photographic? Is it Anime?
Foundation #5: The inner layers (not input layers) are focused on core ideas. What does a human look like? What does a house look like?
Foundation #6: The layers are inverse for input and output layers. Specifically, INPUT 00, 01, 02, 03 are outer layers, and OUTPUT 11, 10, 09, 08 are also outer layers. Again. this is explained in great detail on the guide linked above.
Mixing and matching these layers is the best way to create an intentional, targeted composite merge, and this ultimately what this guide will teach. This is difficult to understand at first, but the it will make more sense in the next section.
DISCLAIMER: Mixing and matching these layers is anything but an exact science. It essentially boils down to trial and error, and adjusting the numbers. Don't get discouraged if your stumbling around in the dark for a while trying to get these styles to play nicely. Remember your friend XYZ Plots!
We will learn how to use MBW and MBW Presets in Approach 4.
Management Advice: Read that guide I linked. Its really good, and will give you a great starting point for what we are about to do.
Merge Block Weight Preset Deep Dive
What the heck is a MBW preset?
This is a gross oversimplification and borderline not even true, but this guide will consider "layers" and "blocks" to be the same thing. In reality, several different blocks make up each layer, but we aren't going into that.
A MBW Preset is a selection of Alpha (and/or Beta) weights that change per layer being merged. If you read that guide I linked in the previous section, it lists all the presets in graph form. If you don't get anything else from that guide, then at least look at these graphs at the bottom.
For example, take GRAD_V. If you look at the graph for GRAD_V, we can see the inner layers are blue, meaning they are 100% model A, and the outer layers are red meaning they are 100% model B. But then the middle layers are a gradual change from model A to model B. Ok thats great, but what does that actually mean?
Well, here is a big write up I made after generating the results myself. And if this guide wasn't long enough before, I have included an example image for each MBW Preset. The image will come first, followed by its Preset and description.
The models used for this were
Model A: absoluteReality
Model B: meinaMix.
Output layers are more impactful for aesthetic.
Input layers are more impactful for composition.
Fine Details Layers: Minor, but important aesthetic polish. Sub surface scattering, shadow textures, eye lashes, nipple textures, etc
INPUT 00 -> OUTPUT 11
INPUT 01 -> OUTPUT 10
INPUT 02 -> OUTPUT 09
INPUT 03 -> OUTPUT 08
INPUT 04 -> OUTPUT 07
Major Details Layers: General Aesthetic and Flair, Photographic or Anime
INPUT 04 -> OUTPUT 07
INPUT 05 -> OUTPUT 06
INPUT 06 -> OUTPUT 05
INPUT 07 -> OUTPUT 04
INPUT 08 -> OUTPUT 03
INPUT 09 -> OUTPUT 02
Core Composition Layers: Pose, Outfit, Background, Base Elements of the Image
INPUT 10 -> OUTPUT 01
INPUT 11 -> OUTPUT 00
GRAD_V: Blends Model B’s image details onto Model A’s composition, with healthy overlap.
GRAD_A: Blends Model B’s composition onto Model A’s image details, with healthy overlap.
FLAT_25: Model A absorbs 25% of Model B’s weight data. Mostly resembles Model A.
FLAT_75: Model A absorbs 75% of Model B’s weight data. Mostly resembles Model B.
WRAP08: Only Model A's fine details are overwritten by Model B's fine details. Composition remains A.
WRAP12: Modal A's major details are combined (not blended) with Model B's details. Composition remains A.
WRAP14: Model A's major image texture is combined (not blended) with Model B's image texture. Composition remains A.
WRAP16: Model A's core aesthetic is combined (not blended) with Model B's core aesthetic. Composition remains A.
MID12_50: Model A retains its fine and major details, but it absorbs 50% of Model B's composition.
OUT07: Model A retains its core composition but its fine and major details are overwritten by Model B.
OUT12: Model A retains its core composition but the entirety of its aesthetic is overwritten by Model B.
OUT12_5: Like OUT12, but Model A also absorbs 50% of Model B's composition.
RING08_SOFT: Model A retains its fine details and core composition but Model B's major details and some composition overwrite Model A's. Result is a composition unique to model A or B. Aesthetic is heavily combined (not blended) together, seems to lean towards Model B.
RING08_5: Very similar to RING08_SOFT. Higher influence of Model B.
RING10_5: Very similar to RING08_SOFT. Still higher influence of Model B.
RING10_3: Very similar to RING08_SOFT. Very high influence of Model B.
SMOOTHSTEP: Model A retains most of its composition, but has some blend with Model B. Nearly all of Model B's aesthetic is absorbed by Model A. Has mild blending.
REVERSE-SMOOTHSTEP: Model A retains most of its composition, and aesthetic, but its aesthetic is slightly influenced by Model B.
SMOOTHSTEP*2: Model A retains its aesthetic, but almost entirely absorbs Model B's composition. Has mild blending.
R_SMOOTHSTEP*2: Somewhat similar to RING08_SOFT, but the aesthetic is almost entirely Model B, but with more blending.
SMOOTHSTEP*3: Model A's composition is combined with parts of Model B's. Model A's major and fine details are overwritten by Model B.
R_SMOOTHSTEP*3: Model B's composition is combined with parts of Model A's. Model B's major and fine details are overwritten by Model A.
SMOOTHSTEP*4: Similar to RING08_SOFT, but the aesthetic is heavily blended. Leans towards Model B's aesthetic.
R_SMOOTHSTEP*4: Model A's composition is overwritten by Model B's. Model A retains a few of its core details. Most of the aesthetic is Model B.
SMOOTHSTEP/2: Very similar to GRAD A. Composition leans slightly more towards Model B.
R_SMOOTHSTEP/2: Very similar to GRAD V. Compositions leans slightly more towards Model A.
SMOOTHSTEP/3: Somewhat similar to RING08_SOFT. The unique composition is more exaggerated, and the aesthetic, while blended, leans more towards model A.
R_SMOOTHSTEP/3: Model A has big parts of its major details and composition combined with Model B. Model A retains its fine details. Aesthetic is leans Model B.
SMOOTHSTEP/4: Similar to RING08_SOFT. Aesthetic is heavily Model B with fine details from Model A.
R_SMOOTHSTEP/4: Model A's composition is overwritten by Model B. Model A retains its major details. Model A's fine details are overwritten by Model B.
COSINE: Model A's and Model B's composition are blended together favoring Model A. Aesthetic heavily favors Model A.
REVERSE_COSINE: Model A and Model B's composition are blended together, very slightly favoring Model B. Aesthetic heavily favors Model B.
TRUE_CUBIC_HERMITE: All of Model A and all of Model B are heavily blended together, slightly favoring Model B.
TRUE_REVERSE_CUBIC_HERMITE: All of Model A and all of Model B are heavily blended together, slightly favoring Model A.
FAKE_CUBIC_HERMITE: Nearly identical to TRUE_CUBIC_HERMITE. Very slight difference in fine details.
FAKE_REVERSE_CUBIC_HERMITE: Nearly identical to TRUE_REVERSE_CUBIC_HERMITE. Very slight difference in fine details.
If you want to see these in an XY Plot yourself, select mbw alpha for the X type (or Y type, etc), and drop this block, exactly like it is, into the Sequential Merge Parameters.
GRAD_V GRAD_A FLAT_25 FLAT_75 WRAP08 WRAP12 WRAP14 WRAP16 MID12_50 OUT07 OUT12 OUT12_5 RING08_SOFT RING08_5 RING10_5 RING10_3 SMOOTHSTEP REVERSE-SMOOTHSTEP SMOOTHSTEP*2 R_SMOOTHSTEP*2 SMOOTHSTEP*3 R_SMOOTHSTEP*3 SMOOTHSTEP*4 R_SMOOTHSTEP*4 SMOOTHSTEP/2 R_SMOOTHSTEP/2 SMOOTHSTEP/3 R_SMOOTHSTEP/3 SMOOTHSTEP/4 R_SMOOTHSTEP/4 COSINE REVERSE_COSINE TRUE_CUBIC_HERMITE TRUE_REVERSE_CUBIC_HERMITE FAKE_CUBIC_HERMITE FAKE_REVERSE_CUBIC_HERMITE
How to Perform a Merge with a MBW Preset
Now that your eyes have sufficiently glazed over, and you're bored to tears, lets find out how to actually use this stuff!
Had over to Super Merger tab, and scroll down the the bottom, and open the Merging Block Weights tab, and you should see this:
Boy does this look intimidating or what? Trust me, this looks much, MUCH, more difficult to use than it actually is.
See this text box? This is the Alpha textbox. There is a beta textbox on the beta tab, but we will ignore that, as we are only concerned with Alpha for now.
Just underneath the Alpha box, we see a "Preset" tab, with a "Select preset" box. Select GRAD_V from the dropdown.
Notice how all the weight sliders moved to something like this? Now notice how none of the numbers in the Alpha box above changed, as they are all still 0.5.
The Alpha textbox with all the numbers in it, is the actual values to be used in the merge. These sliders don't actually do anything except show you what numbers you're targeting for that specific layer.
Click the "↑ Set Alpha" button. Notice how all the numbers fly up from the sliders into that textbox. The textbox goes [input layers][M00][output layers]. Forgetting to set the alpha text box is a mistake you will probably make a lot. Don't feel bad if you do it, I do it still all the time.
We just need to do one more thing, which is select the "use MBW" checkbox. Notice how the Alpha and Beta sliders are now gone. That's because the weights are now coming from the MBW menu down below.
That's basically it. Currently we are just using Presets, but there's no reason you can just set those sliders to whatever you want. Now, when you do a merge, you still select whether you want Weighted Sum, Sum Twice, etc, but the actual Alpha slider weights are dynamic depending on the layer.
You finally have all the pieces required to merge a model using...
Merge Approach 4 - Targeted Composite
Holy cow, are we done yet?
Yes, we are very close to being done.
But first, what is a targeted composite? Imagine that you can predict what a merge will look like when you know you how MBW works on an individual layer basis. If you can get yourself to this level, then you will know with a high degree of certainty what each merge will produce, so you can spend less time in the trail and error phase generating X/Y plots, and more time targeting a composite of things you want in your merge.
Here is a little quiz. Lets say I wanted you to generate a merge that had the look, feel, and overall aesthetic (linework, art, colors) of an Anime model, but had the realistic composition of the photographic model? What layers would you need to change, and to what weights? If Model A was a realistic model, and model B was your anime model, you should immediately be thinking of something like OUT12 and sliding the weights to 1. There is more than one correct answer here.
But lets try this.
What if you wanted your model to have a touch of realism (sound familiar?), but otherwise still be a pure anime model? Well, if you are still thinking OUT12 and slamming those weights to 1, you're on the right track, but that's going to give the merge a big heaping dose of realism, instead of a touch. No, what we want is something like OUT7, but we only want a touch. So the sliders should be set to something custom, like 0.25.
Hopefully now, you can see the immediate value of MBW. I won't lie to you, developing this intuition comes with time, trial and error, and our good friend, X/Y Plots.
Anyway, enough talk. Lets create RestlessAnimace.
This is basically my final thought process during my time making merges, exploring the style I wanted.
So, what are my favorite anime models?
StingerMix, MistoonSapphire, GalenaREDUX, CamellaLine, GhostMix
Well, which model's base aesthetic do I like the most?
Ok then, why do I like the other models?
I like StingerMix and GhostMix for their clean aesthetic.
I like Galena and Camella for their linework and shadow textures.
Well, there is some definite overlap here. I want to cut out overlap as much as possible, if not all together. I want to target specific things from the models I am merging, meaning I only need one source of the qualities I need.
I like StingerMix more than GhostMix, so lets cut GhostMix.
I like Galena's look and feel more than Camella, so lets drop Camella.
Now I have Mistoon as a base, and I want to somehow merge Galena's shadows and textures and StingerMix's clean aesthetic into it. Well, it sure would be cool if I could just add the parts that Galena and StingerMix have, that Mistoon is missing. But I know if I do a straight Weighted Sum merge, its going to smear together, and I dont want that. I want it be nuanced.
I'll use TrainDifference. I want Galena and StingerMix to turn themselves into a single "lora", and then I want Mistoon to absorb that lora.
How much should I take of each?
(Made a TrainDifference X/Y Plot with all MBW Presets)
(Made a TrainDifference X/Y Plot with alpha values of "0.1,0.25,0.5,0.75,0.9".
I'll go through the plot looking for correctness of prompt, and quality of image. First, all of these are great and similar to what I wanted, so its time to be picky, and do zero-shot validation. Now lets see, which of these results do I like the most?
Alpha of A=0.5
Ill generate all three of these merges and do a full txt2img (not SuperMerger) Validation X/Y/Z Plot, using all four prompts, and my three seeds.
Well, that's surprising, I think the simple A=50 is probably the best, so lets go with that. Now, its time to give this a touch of realism. What better way to do that than to perform a MBW merge with a model like AbsoluteReality. Ill generate a X/Y plot with all the OUT Layer Alpha values I think I would want.
Made a several X/Y plots with various MBW weights targeting the OUT layers and absoluteReality.
Well, I think i'll go with this set of values:
0 0.25 0 0.25 0 0.25 0 0.25 0 0.25 0 0.25 0 0.25 0 0 0 0 0 0 0 0 M00
Lets put a finishing touch on this merge, by using Adjust Settings. Out of everything in this guide, this is the easiest thing to understand. This is a group of 7 numbers that control the "fit and finish" of the merge. Back in the Super Merger tab, if you open the Adjust Settings menu, you will see this:
I used these values:
These values mean "reduce noise/detail", "up the contrast a bit", "don't change the color tones".
And with that, I finally backed in
Here is the final validation X/Y/Z Plot.
Not having a clear idea of what you want your merge to look like. Not defining the clear style goals you want to accomplish.
Avoid using big chunks of mega popular models. In otherwords, don't make your merge 90% Dreamshaper, and then think to yourself "Man, this is such a great merge". Like, yeah, of course its a "great merge", its 90% Dreamshaper...
Forgetting that the textboxes are the only values that matter. Sliders do not matter.
Not making enough X/Y/Z plots
Not including the ending values in your X/Y/Z plots. For example, if you wanted an XYZ plot with alpha's of 0.25, 0.5, and 0.75, the mistake is not including 0.1, or 0.9, or even 0, or 1.
If you made it all the way here, and you read through this entire thing, I sincerely, and deeply hope it was useful and worthwhile to you. I hope this guide has given you confidence to try your hand at creating some excellent model merges.
Thank you for your time Dear Reader, and if you give me the privilege of your attention again, I hope to you in my next guide.