santa hat
deerdeer nosedeer glow
Sign In

_Envy_'s Cauldron 05: My image generation settings

_Envy_'s Cauldron 05: My image generation settings

I get asked from time to time what settings I use on my generations. Before I get into the nitty gritty, I want to post a warning that I use a lot of extensions, and it's unlikely that you're going to be able to replicate my generations exactly (most of the time I can't even do it, because some of the extension settings are pretty fiddly and not saved in the automatic1111 image metadata).

Also, most of the functionality I use is available in ComfyUI, and shouldn't be much of a problem for an intermediate user to replicate there.

One other thing I want to mention is that my settings are not the be-all-end-all of image generation. I have arrived at them through experimentation, but there isn't enough time in the world to try every possible setting, so I strongly recommend that you play around with them and make them your own, rather than just using mine verbatim.

Notes on prompting

I've found that with SDXL, particularly with highly finetuned models, it's best to keep your prompt and negative prompt relatively short (the negative prompt especially). With SDXL derivatives, it's generally pretty safe to assume that your output will look nice, so adding tons of synonyms of "low quality" to your negative prompt won't do a whole lot, and if you add too many, things can actually start to look worse. Furthermore, adding something like "extra arms" may fix an image with extra arms, but for other images that wouldn't have had extra arms, it can be detrimental to image quality.

This isn't to say don't use negative prompts, but you probably want to dump most of your negative prompt boilerplate and target your negative prompt to actually avoid problems with the specific image you're trying to generate (or with specific models to offset biases and problems).

General high-quality image generation

These settings seem pretty reliable across most SDXL checkpoints for most types of generation. They aren't the fastest thing in the world, but they'll get the job done and bring out the best in whatever checkpoints or loras you're using.

  • Resolution - I keep this around 1 megapixel. Generally I do 832x1248 (2:3), but also sometimes 1024x1024 (1:1) and 768x1360 (9:16). These also work well in landscape orientations.

  • Sampling steps - 20

  • Sampler - DPM++ 2M SDE Karras or DPM++ 2M Karras Sharp v1

  • CFG Scale - Varies by checkpoint. Generally I'll go as high as I can go without the image looking overcooked. Sometimes that's 12, sometimes it's 3, often times it's somewhere in between.

  • Batch Size - TEST THIS on your machine. I've set mine at 1 because sometimes larger batch sizes actually slow it down (1 image will take 30 seconds, and a batch of 2 will take 1:10, or something)

  • Hires Upscale - 1.6.

  • Upscaler - 4x-UniScaleV2_Soft. Originally I used the sharpest ones I could find, but this one actually gives me really nice results that I find are plenty sharp.

  • Hires Denoise Strength - Roughly 0.4, but it varies a bit.

  • Refiner - No.

  • ADetailer - Default, except 1024x1024 resolution and 8 steps.

  • Dynamic Thresholding - Usually off, unless specifically needed. When I have it on, I usually have the best luck with it being set to 1 to 2 below the CFG value.

  • CD Tuner - Detail(d1) is 2. All others set to 0.

  • FreeU - Screw it, here's a screenshot:

  • Kohya Hires.fix:

    • Stop Step: 5% of your total steps

    • Depth 3

    • Stop Step 0

    • Depth 6

    • Scale 1.5

    • Disable for later passes: YES

  • Self Attention Guidance: (this updated fork works on the current a1111 release)

    • Guidance scale 0.75

    • Mask Threshold 1.8

    • Gaussian Blur Sigma 1

Fast hires image generation

Sample output here

This is a bit more touchy and situational, but it's extremely fast (if you have the VRAM for it) and doesn't require the LCM quality hit. It also doesn't use hires fix, so you don't have to burn a bunch of time running it through the VAE twice. It tends to produce lighter, sharper lines and lots of fine details, sometimes at the expense of overall coherence of the image. Please note that these settings are a work in progress and don't generate good images quite as reliably as the above ones (for people in particular, because they sometimes result in anatomy problems), but they're nice for scenery, architecture, and other images where a lot of dense detail is desirable.

Please note that these settings are for 24 gigabytes of VRAM. If you have less than that, you may need to go smaller or enable tiled VAE.

  • Resolution: 1200x1900

  • Sampling steps: 40 (Sampling steps are cheap here because a lot of time is spent in the VAE, which isn't affected by them)

  • Sampling method: DPM++ 2M SDE Karras

  • CFG Scale: Same as above

  • Hires Fix: Off, unless you want to run out of VRAM.

  • Adetailer: As above, if you need to run it. I generally leave it off here.

  • Dynamic Thresholding: SD 1.4 detaults (yes, for SDXL, not a typo. This helps Kohya Hires Fix ease up a bit on the excessive detail)

  • Kohya Hires.fix:

    • Stop at 0.25

    • Depth 3

    • Stop at 0.35

    • Depth 6

    • Scale 1.3

  • SAG: Disabled, as it causes contrast to get crazy at higher Kohya Hires fix settings. Maybe this is fixed now? Try it out, as YMMV.

58

Comments