I'm trying to figure out what the best performance config is for automatic1111 with SDXL 1.0, right now it seems... slow.. I am able to get maybe 4 it/s at 1024x1024, but a lot of times it falls back to 1+ second and iteration and seems to drag, while the CUDA usage is at 99%.
command line args right now are:
COMMANDLINE_ARGS=--xformers --opt-sdp-no-mem-attention --no-half-vae --opt-channelslast
Hi, --xformers and --opt-sdp-no-mem-attention are both optimizers. For NVIDIA GPU, obviously, you want to use --xformers, if you have at least 8Gb of VRAM. If you have 6Gb or lower - it's better to use --opt-sdp-attention sometimes, which is slower, but uses less VRAM.
right now it seems... slow. Well, what did you expect from 1024x1024 native generation?
Also, i've heard some ppl have faster generation speed and less RAM/VRAM usage on (Un)Comfy UI, which is node-based Stable Diffusion WebUI.
I've never seen any "best" performance settings for SDXL, but if you want to do it through config file - you couldn't.
You can also try token merging settings, you should go to WebUI settings -> Optimizations ->
Token merging ratio (0=disable, higher=faster)
Token merging ratio for high-res pass (only applies if non-zero and overrides above)
Too high values can lead to lower quality. Depends on model. But i'm not sure, if this works for XL models.