santa hat
deerdeer nosedeer glow
Sign In

Working with Stable Diffusion

Aug 26, 2023
tool guide
Working with Stable Diffusion

Installing and configuring webui

I recommend SD NEXT over Automatic1111 Web UI. It's a fork, and I like it better.

Stability Matrix

Installing the webui can be fiddly. After I did it, I learned about Stability Matrix, which is essentially a stable diffusion installer designed to allow you to run multiple frontends and share all of the same models and stuff. Even if you just use webui, the installation is a snap.

I use SD Next, a fork of A1111.

I don't actually use it

While I did install the tools with Stability Matrix, I don't currently use it. You can configure the model directories of SD NEXT to point where you want, and Comfy can do the same.

I have like 70 gigs of SD15 models, and about half that in SDXL models, so I keep them on a separate drive now.

Flags and environment variables

I have an nVidia 3070, which has 8G VRAM. I start webui using the following flag:

--xformers

As far as environment variables go, PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512 seems to work well.

SD NEXT

For SD NEXT, there are a few things I've set up. I think it comes with ADetailer, Tiled Diffusion, and Agent Scheduler.

Favored extensions

https://github.com/zanllp/sd-webui-infinite-image-browsing is an awesome in-app image browser that lets you send images for further processing, zoom in on them, compare them, and search for them.

https://github.com/thomasasfk/sd-webui-aspect-ratio-helper makes it easy to use your preferred resolutions.

https://github.com/butaixianran/Stable-Diffusion-Webui-Civitai-Helper gets data from Civitai for Loras and the like.

https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111 makes upscaling stuff not take so much memory, so you can make bigger images.

https://github.com/ArtVentureX/sd-webui-agent-scheduler lets you queue jobs. Works great with Infinite Image Browsing for cherry picking stuff to upscale.

https://github.com/adieyal/sd-dynamic-prompts lets you use wildcards to get different prompts with each generation.

ControlNet

https://github.com/Mikubill/sd-webui-controlnet lets you make images that conform to guidelines.

https://github.com/huchenlei/sd-webui-openpose-editor integrates a pose editor to use with controlnet so you can pose people.

Settings

You can specify where your models are, where to generate your images, and the theme of the UI.

Where to keep all those huge checkpoints

For models, these are under "System Paths".

Where to save your images

The directories are set at "Image Paths".

Under "Image Options":

  1. Set the "Images filename pattern" to [seq]-[model]-[uuid].

  2. Check "Save images to a subdirectory" and "Save grids to a subdirectory"

  3. Set the "Directory name pattern" to [datetime<%Y>]\[datetime<%m>]\[datetime<%d>].

This will save your stuff neatly in folders like 2023\08\21.

I save all my images as jpg. They're much smaller and you really can't really tell the difference between them and png. Unfortunately, Civitai doesn't parse embedded generation info in webp files.

Use a different theme

You can find the UI Theme under "User Interface". I'm using reilnuud/polite.

Generating images

Specifying default settings

You can change the default settings for most of the inputs by editing ui-config.json. You can also use "User interface defaults" in Settings.

Set your default sampler, resolution, upscaler, and denoising here.

Checkpoints

Each checkpoint may specify its own "best" settings. A lot of SD15 anime models are derived from an older model with only 11 layers, and so they all use clip skip 2.

Samplers

Samplers all fall into different groups.

A sampler is either fast, or slow.

A sampler is either converging, or non-converging. A converging sampler will refine image composition, rather than drastically change it as steps increase.

Samplers fall into different groups as to what image they produce.

You can learn all about samplers from Silicon Thaumaturgy's video:

As a standard practice, I use DPM++ 2M Karras at 25 steps.

Dimensions

The bigger the better, for the original image. Keep in mind that changing dimensions will impact the image generated by the same prompt. Since prompts are wild, unstable magic, you should decide your dimensions first, before fine-tuning your prompt.

I typically prompt at 768x768 or 512x768.

VAE

If your image looks washed out, maybe the checkpoint you're using doesn't include a VAE. If that happens, specify the VAE.

Finding the right image

Batching

You can run your prompt in batches. Increasing the batch size uses more memory (it looks like it tiles each of the images into a single image), and batch count is the number of times it does that. Each additional image in a batch will increase the seed number by 1, resulting in a different image. I like to create 6 images from a prompt at a time.

X/Y/Z script

You can also experiment with the X/Y/Z script, which will allow you to, for example, run the same prompt on different models, with different samplers, or different steps. It will output a grid that allows you to view the results all together. Of note here is "Prompt S/R", which will search and replace the first item in your comma-separated list with the other items in your comma-separated list. An example can be found at https://civitai.com/posts/335853.

Making stuff bigger

When generating images, I messed around a lot with different methods of taking an image and making it better. I ran into a lot of out-of-memory errors along the way. In the end, the solution was simple: Tiled Diffusion.

Tiled Diffusion is an alternative to txt2img hires fix, an Extras upscale, img2img SD upscale, and img2img Ultimate SD upscale. It allows you to scale the image up using an upscaler like 4x-UltraSharp or ESRGAN_4x without freaking out over your memory so much. It can also work in tandem with hires fix.

Tiled VAE lets you render bigger stuff; it's the non-upscaling part of the picture.

23

Comments