After weeks of testing, hundreds of LoRAs, I've finally settled on the LoRA training setup that gives me theĀ sharpest, most detailed, and most flexibleĀ results withĀ Tongyi-MAI/Z-Image-Turbo.
This brings together everything from my previous posts:
Training atĀ 512 pixelsĀ is overpowered and still delivers crisp 2K+ native outputsĀ ((meaning the bucket size not the dataset))
RunningĀ full precisionĀ (no quantization on transformer or text encoder) eliminates hallucinations and hugely boosts quality ā even at 5000+ steps
TheĀ ostris zimage_turbo_training_adapter_v2Ā is absolutely essential
Training time with 20ā60 images:
~15ā22 mins on RunPod onĀ RTX5090Ā costsĀ $0.89/hr (( you will not be spending that amount since it will take 20 mins or less))
Template on runpodĀ āAI Toolkit - ostris - ui - officialā
~1 hour on RTX 3090Ā ((if you sample 1 image instead of 10 samples per 250 steps))
Key settings that made the biggest difference
ostris/zimage_turbo_training_adapter_v2
saves (dtype: fp32) note when we train the model on AiToolKit we utilize the full fp32 model not bf16. also this was the reason your LoRA looked different and slightly off in comfyui.

No quantization anywhere
LoRA rank/alpha 16 (linear + conv)
sigmoid timestep
Balanced content/style
AdamW8bit optimizer, LR 0.00025 or 0.0002, weight decay (0.0001).Ā Note :
I'm currentlyĀin process of testing ProdigyĀ optimizer- still under process.steps 3000 sweet spot >> can be pushed to 5000 if careful with dataset and captions.
3 different AiToolKit configs:
Full ai-toolkit config.yamlĀ optimized fast.
Heavy training config (use this if you don't mind renting a heavy gpu or own one, minimum 42Gb of Vram, I'm talking 1hr for 3000 steps on H200š) perks= no rounding errors, full on beast mode.
**Note: this applies to all configs if you're character or style locked in at earlier step eg. 750-1500, there could be still fine-tuning needs to be done, so if you feel like it looks good, lower your learning rate from the 0.00025 to 0.00015, 0.0001 or 0.00009 to avoid overfitting and continue training at your intended steps eg 3000 steps or even higher with the lowered learning rate.
copy the config follow the arrow and click on the Show Advanced Tab

2.paste in the config file info in here, after pasting do not back out instead follow the arrow and click Show simple then when inside of the main page add select your dataset.

worflows and resources:
workflow. ComfyUI workflow (use exact settings for testing/ test with bong_tangent also it works decently)
fp32 workflow (same as testing workflow but with proper loader for fp32)
flowmatch schedulerĀ (( the magic trick is here/ can also test onĀ bong_tangent))
UltraFluxVAEĀ ( this is a must!!! provides much better results than the regular VAE)
Pro tips
1.Always preprocess your dataset withĀ SEEDVR2Ā ā gets rid of hidden blur even in high-res images
SeedVR2 slightly updated workflow with blending original image for color and structure.Ā
((please be mindful and install this in a separate comfyui, as it may cause dependencies conflicts))
link for the SeedVR2Ā older version , download it as zip, extract it into your custom nodes folder. then go into your python_embed folder and use this command with your path of course : python.exe -m pip install -r "C:\Users\youruser\path-to-your-requirements file\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\requirements.txt"
1B-Ā Downscaling py scriptĀ ( a simple python script I created, I use this to downscale large photos that contain artifacts and blurs. then upscale them via SeedVR2 eg. 2316x3088 that has artifacts or blur technically not easy to use but with this I downscale it to 60% then upscaling it with SeedVR2 with fantastic results. works better for me than the regular resize node in comfyui **note this is local script, you only need to replace input and output folders paths in the scripts as it does bulk resizing or individual, takes split of seconds to finish as well even for Bulk resizing)
2.Keep captions simple, don't over do it!
PSA: When the configuration is followed exactly, I have not observed a single failure case of style preservation across all tested datasets.
for example :you can literally have your character in in the style of sponge bob show chilling at the crusty crab with sponge bob and have sponge bob intact alongside of your character who will transform to the style of the show!!Ā just thought to throw this out there.. and no this will not break a 6b parameter model and I'm talking at strength 1.00 lora as well. remember guys you have the ability to change the strength of your lora as well. Cheers!!
IMPORTANT UPDATE Why Simple Captioning Is Essential
Iāve seen some users struggling with distorted features or āmushyā results. If your character isnāt coming out clean, you are likely over-captioning your dataset.
z-image handles training differently than what you might be used to with SDXL or other models.
The āClean Labelā Method
My method relies on a minimalist caption.
If I am training a character who is a man, my caption is simply:
man
Why This Works (The Science) ⢠The Sigmoid Factor
This training process utilizes a Sigmoid schedule with a high initial noise floor. This noise does not āsettleā well when you try to cram long, descriptive prompts into the dataset.
⢠Avoiding Semantic Noise
Heavy captions introduce unnecessary noise into the training tokens. When the model tries to resolve that high initial noise against a wall of text, it often leads to:
Disfigured faces
Loss of fine detail
⢠Leveraging Latent Knowledge
You arenāt teaching the model what clothes or backgrounds are, it already knows. By keeping the caption to a single word, you focus 100% of the training energy on aligning your subjectās unique features with the modelās existing 6B-parameter intelligence.
⢠Style Versatility
This is how you keep the model flexible.
Because you havenāt ābakedā specific descriptions into the character, you can drop them into any style, even a cartoon. and the model will adapt the character perfectly without breaking.
Additionally, here is full fp32 model merge:
Full fp32 model here : https://civitai.com/models/2266472?modelVersionId=2551132
Credit for:
Tongyi-MAI For the ABSOLUTE UNIT OF A MODEL
Ostris And his Absolute legend of A training tool and Adapter

