This guide will give you advice from the express viewpoint of a beginner who has no idea where square one is. You will outgrow this advice as you tinker with A1111.
When I first started with A1111 and Stable Diffusion, I was always so frustrated that my images didnt look anything like the rest of the community's images, but I didn't know where to start. If you feel that way, this guide is for you.
Yeah, I'm not reading any of this, just give me the values, ok?
Copy paste string:
Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 7, Size: 512x768, Denoising strength: 0.4, Clip skip: 2, Hires upscale: 2, Hires steps: 15, Hires upscaler: 4x_NMKD-Siax_200k
If 4x_NMKD_Siax_200k cant be found, read the upscaler section.
If Clip skip is in the Override Settings box, read the clip skip section.
So you finally took the plunge and installed Stable Diffusion. You got the web console running and now you're staring at this; a blank canvas.
Ok, so now what? Suddenly, you are face to face with your own creativity, and you realize you don't know the steps to create the vision in your head. That's where this guide comes in. We aren't going to focus on inpainting, img2img, controlnet, or any of the other neat things you can do with Stable Diffusion. We need to walk before we can run.
If you are interested inpainting, img2img, or controlnet, I highly recommend A13JM's guide - Making Images Great Again!
Everyone has their own workflow. My workflow will be different then your workflow, which will be different than some other persons workflow, and that's just fine. But when you're first starting out, it's so many settings and number values thrown at you all at once, that you don't have any idea what does what, much less a well defined workflow.
So, lets ignore all those settings and number values for now, and lets learn one setting at a time.
This guide assumes you have a pretty decent GPU to generate images like a RTX 2080 or a RTX 3070 minimum, and you are running Automatic1111 locally.
Setup To Follow Along
For this guide I am also going to use the excellent model Rev Animated v1.2.2 as its generally very forgiving, and permissive.
Now, let me give you a positive and negative prompt so that you can follow along. Don't worry about the content of either of these prompts for this guide. This guide isn't going to focus on prompting, its going to focus on the settings. This is just so you and I are on the same page.
1girl, medieval village, nobility, strapless purple dress, bare shoulders, blonde hair, hair up, messy bun, pendant necklace, castle balcony, (masterpiece:1.2), soft lighting, subsurface scattering, heavy shadow, (best quality:1.4), golden ratio, (intricate, high detail:1.2), soft focus
(disfigured:1.3), (bad art:1.3), (deformed:1.3),(extra limbs:1.3),(close up:1.3),(b&w:1.3), weird colors, blurry, (duplicate:1.5), (morbid:1.3), (mutilated:1.3), [out of frame], extra fingers, mutated hands, (poorly drawn hands:1.3), (poorly drawn face:1.3), (mutation:1.5), (deformed:1.5), (ugly:1.3), blurry, (bad anatomy:1.3), (bad proportions:1.5), (extra limbs:1.3), cloned face, (disfigured:1.5), out of frame, (malformed limbs:1.1), (missing arms:1.3), (missing legs:1.3), (extra arms:1.5), (extra legs:1.5), mutated hands, (fused fingers:1.1), (too many fingers:1.1), (long neck:1.5), Photoshop, video game, ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy
These prompts, using Rev Animated, and all default settings (simply refresh the A1111 tab to get the default settings) should generate something that looks something comparable to this image. It might be a different pose, and it might not be the same background, but it will still be a blonde woman, in a purple dress, in a castle. This is the image I generated:
At first glance, this image is pretty good (mainly thanks to how good ReV Animated is) , but it very much lacks that "I know what I'm doing" feeling that you see with the rest of AI generated art. Its also kind of small because its 512x512, and so its a little fuzzy in places as the sharpness isn't there.
Would you believe the image below is the exact same prompt but the difference in quality is simply due to different settings? It's also not even the real image, as Civitai reduces the quality of the image I am allowed to paste into this guide. This is the full res image.
That is what this guide is going teach. How to get the above quality by tweaking some settings.
Seed / Extra (Seed)
Going out of order for this first setting; the Seed. This has no effect on the quality of image generation, but it is important. The seed is the starting latent image that your image is generated from. This is the "key" that allows other people to reproduce the same image you made, and its why you can just keep hitting the "Generate" button and get different images.
The default value is -1 which means random. There are two buttons next to the seed. A dice button, and a recycle button.
The dice button means: "Make the seed random", and sets its value to -1.
The recycle button means: "Reuse the seed of the image we just generated", and places the previous image's seed into the seed box.
You can also enter in the seed manually.
Extra Seed just allows you to use the same primary seed, but add variance.
Beginner Advice: Leave the seed at -1 unless your trying to generate someone else's image. Otherwise, you might chase yourself around in circles modifying the prompt without realizing that you forgot to change the seed back to -1 and the particular seed you were on just didn't want to play nicely with your prompt. Don't bother with Extra Seed for now.
>IMPORTANT< Follow Along Step: Set the Seed to 3640280682 to produce all the images in this guide. Press Generate once you have set the seed to 3640280682, and you should see the first image in this guide.
Sampling Method & Sampling Steps
The sampling method is straight forward enough. This is the algorithm the Stable Diffusion AI uses to chip noise away from the latent image. If that sentence made no sense to you, and you want to learn more, there is a frankly excellent guide that explains the inner workings of samplers better than I ever could, and it is a highly recommended read. It can be found here.
Beginner Advice: If you just want to know which one to pick to get started, here is a quick rule of thumb.
Euler a: Useful for rapid prototyping and adding details during the trial and error phase of refining your prompts. Euler a is very fast and delivers good quality, and excels in this purpose. However, if you like the image Euler a generated, that's totally valid too. The downside is that Euler a can leave images feeling a bit 'hollow' and light on details sometimes.
(EDITED 2023AUG29) DPM++ 2M Karras: Useful for when you have your prompt mostly dialed in. While Euler a is good, I find that DPM++ 2M Karras is great at making images feel "fuller" with more life and details, however it is noticeably slower than Euler a.
EDIT: This section used to recommend DPM++ 2M SDE Karras. However, after a lot of testing I personally feel that DPM++ 2M Karras is much more "stable" when it comes to generating images. However, Im not going to redo all the images of this guide. They are still made with DPM++ 2M SDE Karras. This is not to say that it is a bad sampler, its just not as beginner friendly as DPM++ 2M Karras. Nearly all of the same logic stands for DPM++ 2M Karras, so the guide is still relevant, however, you can get away with less steps like 25, though I still recommend 35 if your GPU can handle it.
Follow Along Step: Set your sampler to DPM++ 2M SDE Karras, and click Generate. But wait? Why did the quality go down? Read on.
This is the amount of steps you are giving the AI to decide how to draw your prompt. The key to remember here is that the more steps, does not mean better. If you crank the steps to 150, you could end up with a lot of extra nonsense in your image, because the AI figured out how to draw you image after step 40, but now the AI has to continue for another 110 steps, so it makes up random stuff trying to keep it inline with your prompt. This is a gross oversimplification as the real reason involves going to into the "de-noising schedule", but that is way beyond the scope of this guide.
Think of it like a golf swing with a driver. Too few steps means the AI doesn't have enough time to draw your image at all. Too many steps means the AI will over shoot your image and start oscillating trying to maintain your prompt which can get weird (EDIT: This is not true with DPM++ 2M Karras, hence the edit above. You can set the steps for DPM++ 2M Karras to be anything as long as its like 25+).
Also, the number of steps you need changes depending on the sampler used. A sampler like Euler a can get away with 20 steps easily. A sampler like DPM++ 2M SDE Karras can sometimes do 20 steps, but its better to give it more room to work with.
Beginner Advice: Set your sampling steps to 35. I find that 35 is a greatly healthy middle ground for all samplers, and 35 steps seems to be the cliff edge before samplers start to oscillate. Also, 35 steps lines up perfectly for another setting later on, the Hires. Fix. Feel free to play with this range. Recommended settings are between 20 and 50 steps. More than 50 steps is wasted effort (most of the time).
Follow Along Step: Set your steps to 35, and click Generate. Ahh, thats much better. DPM++ 2M SDE Karras just needed more steps to work with.
Restore Faces / Tiling
I wont lie to you, I never use this because I use Hires. fix. However, if your graphics card is terrible, and you can't use Hires. fix, you will be stuck on making small images. Restore Faces can come in handy in that regard. Specifically, this setting is a small extra pass at the end of the image generation that revisits any faces and trying to "fix" them by airbrushing them. Ive seen it work, with limited success.
Beginner Advice: Leave Restore Faces unchecked unless you can't use Hires. fix and your faces are coming out badly.
Follow Along Step: Don't bother.
I again, wont lie to you. I've never seriously used this setting. It generates images as literally tiling, so you can use it as wallpaper or something. So I guess if you're trying to make a repeatable image for a wallpaper, hey knock yourself off.
Beginner Advice: Leave this unchecked.
Follow Along Step: Don't bother.
This one checkbox gets its own section. (As of A1111 Version 1.6, this is is now simply drop down menu) This is where all the magic happens, as it's the single biggest thing you can do to increase the quality of your images. Now, I know what you're thinking..."How can a single checkbox increase the quality of my image so dramatically?" I'm not smart enough to pretend to know the answer to that question but I can tell you who does. It's the fine people who made the sampling method guide above. They have another excellent guide that explains upscaling. It can be found here.
When you check the Hires. fix box, a bunch of new settings show up. In order to understand these settings, its useful to understand how upscaling works. To "upscale" means to increase the resolution of the image. It makes the image bigger and adds details and further refinement.
It goes like this:
You generate a base image using:
Hires. fix will then upscale your image using:
The default upscalers included with A1111 are ok. They work well enough, but they leave a lot of quality on the table. I highly recommend getting two extra upscalers from the internet. Download the
.pth file and place them in your
stable-diffusion-webui/models/ESRGAN folder and restart A1111. They can be found here:
The two I recommend are:
Beginner Advice: I recommend getting both 4x-UltraSharp and NMKD Siax. They both just kind of work with everything. There are better purpose built upscalers for things like anime, but you cant go wrong with either upscaler, as they just work with all types of content, including anime. Try them both out for the content your making. I find Remacri hit or miss on details at very high res, but its also still very good. 4x-UltraSharp seems to be better at making a cleaner picture, and NMKD Siax seems to be better at handling finer details.
Follow Along Step: Set your upscaler to 4x_NMKD-Siax_200k and click Generate. This will take longer than without Hires fix, so you might have to wait for a minute or two. This is a great image, but its distinctly not the image we had before upscaling. The upscaler changed a lot of the details. This is generally ok with more artistic images, but it can wreak havoc on photo-realistic images. Lets keep going.
This is very similar to sampling steps. Once your base image has been created, the upscaler will upscale your image over the number of steps you have selected in "Hires steps". If you leave this at 0, Hires steps will use the same number of sampling steps. So if your sampling steps are 35, and your Hires steps are set to 0, then you're going to get 35 Hires steps, for a total of 70 image steps.
Here's the thing about Hires steps. They are slow. The vast majority of your image generation process time will be spent in Hires upscaling. However, this is also where most of the quality of your image is added.
Beginner Advice: I find that 15 Hires steps is a solid number, for almost everything. This also lines up perfectly with 35 sampling steps meaning the total steps required to generate an image is 50.
Follow Along Step: Change your Hires steps to 15, and click Generate. This is also a great image, and it uses much less time to generate, as you don't actually need all those Hires steps. But its the same problem as before. This isn't the same image we had before upscaling.
When the upscaler is processing your image, it is allowed to change a percentage of your total image as the "cost" for upscaling it. This cost is the Denoising strength. A denoising strength of 0 means the upscaler isnt allowed to change anything with means you wont get any extra quality. A denoising strength of 1 means the upscaler is allowed to change everything and your result will be wildly different than you expect. The default value is 0.7 which is far too high for 4x-UltraSharp and 4x_NMKD-Siax_200k (although its ok for Latent Upscalers, but we arent going to use those in this guide).
You can think of 'Denoising strength' like using a putter in golf. You want to hit the ball with enough force that it goes solidly into the hole, but not so hard that it flies right over it, but not also so gently that the ball doesn't move more than two inches.
In the same way, you need to give the Denoiser strength enough power that the high res steps have something to work with to add quality as they upscale. However, too much strength and they will just start adding a ton of extra stuff to your image that you don't want (like extra eyes and limbs). Too few steps and you get no quality. Too many steps and you get "too much".
Beginner advice: Set denoising strength to a value between 0.3 and 0.5, with 0.4 being a nice middle ground.
Follow Along Step: Change your denoise strength to 0.4 and click Generate. This is clearly a much closer image to the one we had before upscaling, and it looks very good. But we want to see more of the dress, so lets give the AI some room to work with.
Upscale by / Width / Height:
This is pretty straight forward. The width and height are the width and height in pixels, of your base image, not the upscaled image. The "Upscale by" is the multiple you wish to upscale by.
512x512 -> Upscale by 2 -> gives a final image size of 1024x1024
512x768 -> Upscale by 2 -> gives a final image size of 1024x1536
768x1024 -> Upscale by 1.5 -> gives a final image size of 1536x2048
There's no getting around it. Upscaling taxes your graphics card pretty hard, and if you're going to run out of VRAM, its going to be here. I don't have any generic beginner advice other than to experiment with how hard you can push your GPU before it cries uncle.
Beginner Advice: Use 512x512 (square), 512x768 (portrait), 768x512 (landscape) and see what your GPU can handle. I currently have a RTX 3070 and I can run any combination of these with a 2x upscale, quite comfortably, and quickly. 768x1024 -> 2x is where my GPU cries uncle.
Follow Along Step: Set your height to 768, and leave your width 512, and click Generate. And we now arrive at the image at the beginning of this guide! Well done if you made it this far!
The CFG scale is the amount of force you want to put behind your prompt to the AI. If you set the CFG scale to 30, you are basically putting a gun to the AI's head and forcing it to use only the words provided in your prompt, which results in a "overcooked" look. If you set the CFG scale to 0, you are telling the AI that your prompt doesn't matter even slightly and the AI can do whatever it wants.
Beginner Advice: The default of 7 is actually pretty great. I would just leave it. However if you feel compelled to change it, its "safe" to put it anywhere within 5 to 11ish.
Follow Along Step: Set your CFG to 20, and click Generate. We can see that the AI is getting stressed out that it doesnt have enough freedom to be creative. Best to just leave the CFG at 7 for now.
Batch count / Batch size
These have no effect on the quality of your image generation so ill only briefly touch on them.
Batch count is how many cycles you want to generate images per "Generate" button press. If you set Batch Count to 100, you will generate 100 images if you press "Generate".
Batch size is how many images each cycle generates. This uses a lot of VRAM. Keep it to 1 for now, because it takes forever, and learning A1111 is more important than pumping out huge numbers of images hoping one of them is good.
Beginner Advice: I wouldn't bother with either one of these for now. Just make single images until you dial in your prompt just they way you want it. Then, if you really feel like it, you can make a batch, and choose the best image.
If stable diffusion was to have a holy war, Clip skip could be one of the causes. Some people swear by it. Others think its pointless.
What is clip skip? When the AI "digests" your prompt so that it can draw your image, it does so in layers. If you have clip skip set to 2, you are telling the AI, "dont do the last layer". If you have a clip skip of 3, you are telling the AI not to do the last two layers of prompt digestion. This is gross over simplification, but its close enough.
Ok, so why is this useful? Because, you can potentially get better results with clip skip 2. Emphasis on potentially.
For reference, here is the final image, but with Clip Skip 2 (Full Res). I personally think the Clip Skip 1 image is better. Clip Skip 2 does tend to be "cleaner", as is the case here. However, my hope is that this shows how subjective the whole debate is.
Beginner Advice: Just leave it on Clip skip 1 or 2 depending on what you prefer. It doesn't matter. (If you really need me to make up your mind, leave it on 2)
Follow Along Step: To change the Clip Skip, go to the Settings Tab in A1111, and select "User Interface" from the Menu List on the left hand side. Then, scroll down until you see a box called "Quicksettings list". Add the CLIP_stop_at_last_layers item to that box. Scroll back to the top and click the big "Apply Settings" orange button and click the "Reload UI" button right next to it. When the UI comes back online, you should see the Clip Skip quicksetting at the very top of A1111.
How the Copy Generation Data function works
Now, I know what you're thinking.
"So all these settings and stuff is great, but do I really have to set this stuff by hand each time I want to create an image that someone else used?"
And the answer is no, but its not obvious.
Go to any image on Civitai and you will see the image generation data, if its included in the image. You can use this as an example: https://civitai.com/images/1379913?postId=354933
Now, at the very bottom right of the page, you will se a tiny button that when you hover over it says "Copy generation data". Click that button.
Then go back to A1111, and paste the data into the positive prompt box. It will look like this:
Then you click this blue button with an arrow pointing down left, right underneath the Generate Button, all the way to the left.
Once you click that button, all the parameters in A1111 should set themselves to the appropriate values.