Flux Ultimate@home
How to do highres inference on flux dev and schnell.
You need my Flux Ultimate@Home LoRa to get the best experience from the workflow. https://civitai.com/models/1049158?modelVersionId=1177215
This workflow and LoRA lets you perform inference at higher resolutions than the standard Flux Dev model typically allows. High-resolution inference with the standard base model often introduces checker boarding artifacts in images and produces issues with coherent noise scheduling. It gets discussed a bit here: https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/1712
Checker boarding Example:
To address these challenges, there are several QoL custom nodes designed for higher resolution workflows. The main one is the Model Sampling Flux Normalized node, which normalizes the sigma schedule to support higher resolutions effectively. An empty latent picker with optimized resolution sizes. The Flux Highres Fix Scaler for automatic scaling with noise injection and a textbox with integrated token counter.
For optimal results, I try to keep my CLIP-L tokens under 77 and my T5-XXL tokens under 256, although 512 also works reasonably well. Beyond these limits, you may observe unwanted results, with this inference setup the model gets a bit more literal and likes longer prompts in the T5-XXL.
The sampler setup is quite delicate, so I recommend adjusting it only if you know what you are doing, particularly when working with the high-resolution fix sampler and scheduler.
This workflow is designed to work in conjunction with my UltimateAtHome LoRa. The LoRa’s primary purpose is to resolve the noise scheduling issues encountered with Flux Dev when performing inference above 1.5 MP. Currently, the resolution limit is 3 MP, but I’m training a new version to support up to 4 MP. https://civitai.com/models/1049158?modelVersionId=1177215
Have fun experimenting
- 42lux
Settings:
UltimateAtHome LoRa:
The LoRa model tends towards realism. For general use, a strength of around 0.25 is recommended, as this primarily addresses the noise schedule in high-resolution sampling. In the range of 0.5–1, the LoRa also corrects issues with flux skin and chin artifacts.
ModelSamplingFluxNormalized and CLIPTextEncodeFlux:
ModelSamplingFlux and guidance behave similarly to the normal flux version.
Recommended guidance values range between 2.5–4.2.
PAG Attention:
Higher PAG values enhance overall detail. A good general value is 1.75, though personally, I would not recommend exceeding 3.
LyingSigmaSampler:
Higher values refine smaller local details. I suggest not exceeding -0.1.
FluxHighresFixScaler:
This setting scales and introduces noise, allowing you to choose the desired upscale method and the amount of noise to apply. If you notice residual noise in the high-resolution fix (e.g., blotchy or uneven backgrounds), this is the parameter to adjust.
MultiplySigmas:
This functions similarly to the Lying Sigma Sampler but for the highresfix inference.
HARDWARE REQS:
The workflow just barely fits into 16GB of VRAM and 32GB System RAM at 6MP resolution.
For Ada Lovelace architecture cards, like the 4xxxx series, it’s best to use “fp8_e4m3fn_fast” along with Torch Compile. However, since many people encounter issues with Torch Compile, it’s not included in this basic workflow.
For Ampere Architecture cards, like the 3xxxx series, using GGUF quants is recommended; otherwise, you will consistently run out of memory (OOM).
PERFORMANCE:
4090 w/o compile:
3MP = 85s
6MP = 220s
4090 w compile:
3MP = 30s.
6MP = 95s