Type | Workflows |
Stats | 366 0 |
Reviews | (20) |
Published | Feb 1, 2025 |
Base Model | |
Hash | AutoV2 4774AE9C99 |
Intro
This is my "Ultimate" SDXL workflow. After about a year of messing around with SDXL checkpoints. This is what i've sort of landed with that has a good balance of quality and speed. 3 minutes from 0 to 2721x6167 that blows the doors off Flux. It tries to push SDXL to the max in terms of detail and resolution trying to match FLUX. Tons of great SDXL models that I wish had the clean Flux look. You can make stuff that looks VERY close to Flux by using alot of these great extentions that in my opinion put SDXL on Meth. Some of this stuff is very obscure and most workflows are overly bloated with nonsense. Hopefully this is something that you can just expand on as a base.
QuickStart
Everything is initially setup for pure text-to-image. Enter a prompt and hit queue. Example prompt is provided.
Optional Stuff
Everything is setup to do depth-controlnet or image-to-image without messing with anything. Just enable the following group and go.
image-to-image
depth-controlnet
Advanced
There are many knobs to tweak. Honestly, you'll have to experiment and see what does what, but breifely I think its worth at least noting from my experience most important bits.
SeaArtLongXLClipMerge
All SD checkpoints have a clip limited to 77 tokens and so this patches any clip to 248 tokens allowing alot more detailes. Actual human readable prompts work with it much better similar to flux but it's still very limited. Each tile can hold 248 tokens generated from Florence + base prompt which gives unbelevable results at high tile counts.Clip Text Encode with BREAK
This in conjuction with the long clip merge node gives alot of new possibilities and simplifies a ton of noodles around conditioning. Conditional merges are all in one prompt seperated by the BREAK.CLIP NegPip + PerturbedAttentionGuidance
This definetly helps cleanup the image alot. Generations are much more coherent and make sense. Alot of artifacting goes away by simply having these enabled. A setting of 1.5-2 is about all you really want.Dynamic Sampler + Detail Daemon
I have all this setup how i want but you may want to experiment, but these dynamic samplers push alot more refined + smooth details out of the image almost like flux. Honestly you dont even need to pass the image through Flux because it looks that clean.
Conclusion
Basically steps 1 and 2 are the primers for the actual tiling steps 3 and 4. The image has to be nearly perfect in composition and mostly free from any major artifacting before going into tiling. Step 1 tries to just get a good composition and you may have to change around conditioning and CFG ramping to get a desired result depending on model used etc. Step 2 does the heavy lifting to clean up everything and 2 Mega Pixels seems to be the sweet spot with the TTPlanet Realistic Tile Controlnet before requiring full breakup into tiles. Step 3 and 4 require a nearly perfect image for them to work properly. Going to 8MP and 16MP only requires 30% denoise for "defogging" and "sharpening" everything to bring it all into focus. After 2MP the need for CFG and denoise goes away as you already have the image. It's really just a matter of massaging defects out along the way to 16MP. Thats just my perspective on it all.
Curious to see what people think and would wanna see anything that could improve on this because ive tried so many workflows in the past and nothing ever seemed to really go as far as this.