Sign In

SDXL Ultimate 16 MegaPixel

20
366
8
Type
Workflows
Stats
366
0
Reviews
Published
Feb 1, 2025
Base Model
SDXL 1.0
Hash
AutoV2
4774AE9C99
default creator card background decoration
smeks's Avatar
smeks

Intro

This is my "Ultimate" SDXL workflow. After about a year of messing around with SDXL checkpoints. This is what i've sort of landed with that has a good balance of quality and speed. 3 minutes from 0 to 2721x6167 that blows the doors off Flux. It tries to push SDXL to the max in terms of detail and resolution trying to match FLUX. Tons of great SDXL models that I wish had the clean Flux look. You can make stuff that looks VERY close to Flux by using alot of these great extentions that in my opinion put SDXL on Meth. Some of this stuff is very obscure and most workflows are overly bloated with nonsense. Hopefully this is something that you can just expand on as a base.

QuickStart

Everything is initially setup for pure text-to-image. Enter a prompt and hit queue. Example prompt is provided.

Optional Stuff

Everything is setup to do depth-controlnet or image-to-image without messing with anything. Just enable the following group and go.

  • image-to-image

  • depth-controlnet

Advanced

There are many knobs to tweak. Honestly, you'll have to experiment and see what does what, but breifely I think its worth at least noting from my experience most important bits.

  • SeaArtLongXLClipMerge
    All SD checkpoints have a clip limited to 77 tokens and so this patches any clip to 248 tokens allowing alot more detailes. Actual human readable prompts work with it much better similar to flux but it's still very limited. Each tile can hold 248 tokens generated from Florence + base prompt which gives unbelevable results at high tile counts.

  • Clip Text Encode with BREAK
    This in conjuction with the long clip merge node gives alot of new possibilities and simplifies a ton of noodles around conditioning. Conditional merges are all in one prompt seperated by the BREAK.

  • CLIP NegPip + PerturbedAttentionGuidance
    This definetly helps cleanup the image alot. Generations are much more coherent and make sense. Alot of artifacting goes away by simply having these enabled. A setting of 1.5-2 is about all you really want.

  • Dynamic Sampler + Detail Daemon
    I have all this setup how i want but you may want to experiment, but these dynamic samplers push alot more refined + smooth details out of the image almost like flux. Honestly you dont even need to pass the image through Flux because it looks that clean.

Conclusion

Basically steps 1 and 2 are the primers for the actual tiling steps 3 and 4. The image has to be nearly perfect in composition and mostly free from any major artifacting before going into tiling. Step 1 tries to just get a good composition and you may have to change around conditioning and CFG ramping to get a desired result depending on model used etc. Step 2 does the heavy lifting to clean up everything and 2 Mega Pixels seems to be the sweet spot with the TTPlanet Realistic Tile Controlnet before requiring full breakup into tiles. Step 3 and 4 require a nearly perfect image for them to work properly. Going to 8MP and 16MP only requires 30% denoise for "defogging" and "sharpening" everything to bring it all into focus. After 2MP the need for CFG and denoise goes away as you already have the image. It's really just a matter of massaging defects out along the way to 16MP. Thats just my perspective on it all.

Curious to see what people think and would wanna see anything that could improve on this because ive tried so many workflows in the past and nothing ever seemed to really go as far as this.