Sign In

My little research about Z-image. LoRA training, fp32 model, upscaling..

17

My little research about Z-image. LoRA training, fp32 model, upscaling..

After weeks of testing, hundreds of LoRAs, I've finally settled on the LoRA training setup that gives me theĀ sharpest, most detailed, and most flexibleĀ results withĀ Tongyi-MAI/Z-Image-Turbo.

This brings together everything from my previous posts:

  • Training atĀ 512 pixelsĀ is overpowered and still delivers crisp 2K+ native outputsĀ ((meaning the bucket size not the dataset))

  • RunningĀ full precisionĀ (no quantization on transformer or text encoder) eliminates hallucinations and hugely boosts quality – even at 5000+ steps

  • TheĀ ostris zimage_turbo_training_adapter_v2Ā is absolutely essential

Training time with 20–60 images:

  • ~15–22 mins on RunPod onĀ RTX5090Ā costsĀ $0.89/hr (( you will not be spending that amount since it will take 20 mins or less))

Template on runpodĀ ā€œAI Toolkit - ostris - ui - officialā€

  • ~1 hour on RTX 3090Ā ((if you sample 1 image instead of 10 samples per 250 steps))

Key settings that made the biggest difference

  • ostris/zimage_turbo_training_adapter_v2

  • saves (dtype: fp32) note when we train the model on AiToolKit we utilize the full fp32 model not bf16. also this was the reason your LoRA looked different and slightly off in comfyui.

rich text editor image
  • No quantization anywhere

  • LoRA rank/alpha 16 (linear + conv)

  • sigmoid timestep

  • Balanced content/style

  • AdamW8bit optimizer, LR 0.00025 or 0.0002, weight decay (0.0001).Ā Note : I'm currentlyĀ in process of testing ProdigyĀ optimizer - still under process.

  • steps 3000 sweet spot >> can be pushed to 5000 if careful with dataset and captions.

3 different AiToolKit configs:

**Note: this applies to all configs if you're character or style locked in at earlier step eg. 750-1500, there could be still fine-tuning needs to be done, so if you feel like it looks good, lower your learning rate from the 0.00025 to 0.00015, 0.0001 or 0.00009 to avoid overfitting and continue training at your intended steps eg 3000 steps or even higher with the lowered learning rate.

  1. copy the config follow the arrow and click on the Show Advanced Tab

rich text editor image

2.paste in the config file info in here, after pasting do not back out instead follow the arrow and click Show simple then when inside of the main page add select your dataset.

rich text editor image

worflows and resources:

workflow. ComfyUI workflow (use exact settings for testing/ test with bong_tangent also it works decently)

fp32 workflow (same as testing workflow but with proper loader for fp32)

flowmatch schedulerĀ (( the magic trick is here/ can also test onĀ bong_tangent))

RES4LYF

UltraFluxVAEĀ ( this is a must!!! provides much better results than the regular VAE)

Pro tips

  • 1.Always preprocess your dataset withĀ SEEDVR2 – gets rid of hidden blur even in high-res images

1A-SeedVR2 Nightly Workflow

SeedVR2 slightly updated workflow with blending original image for color and structure.Ā 

((please be mindful and install this in a separate comfyui, as it may cause dependencies conflicts))

link for the SeedVR2Ā older version , download it as zip, extract it into your custom nodes folder. then go into your python_embed folder and use this command with your path of course : python.exe -m pip install -r "C:\Users\youruser\path-to-your-requirements file\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\requirements.txt"

1B-Ā Downscaling py scriptĀ ( a simple python script I created, I use this to downscale large photos that contain artifacts and blurs. then upscale them via SeedVR2 eg. 2316x3088 that has artifacts or blur technically not easy to use but with this I downscale it to 60% then upscaling it with SeedVR2 with fantastic results. works better for me than the regular resize node in comfyui **note this is local script, you only need to replace input and output folders paths in the scripts as it does bulk resizing or individual, takes split of seconds to finish as well even for Bulk resizing)

  • 2.Keep captions simple, don't over do it!

PSA: When the configuration is followed exactly, I have not observed a single failure case of style preservation across all tested datasets.
for example :you can literally have your character in in the style of sponge bob show chilling at the crusty crab with sponge bob and have sponge bob intact alongside of your character who will transform to the style of the show!!Ā just thought to throw this out there.. and no this will not break a 6b parameter model and I'm talking at strength 1.00 lora as well. remember guys you have the ability to change the strength of your lora as well. Cheers!!

IMPORTANT UPDATE Why Simple Captioning Is Essential

I’ve seen some users struggling with distorted features or ā€œmushyā€ results. If your character isn’t coming out clean, you are likely over-captioning your dataset.

z-image handles training differently than what you might be used to with SDXL or other models.

The ā€œClean Labelā€ Method

My method relies on a minimalist caption.

If I am training a character who is a man, my caption is simply:

man

Why This Works (The Science) • The Sigmoid Factor

This training process utilizes a Sigmoid schedule with a high initial noise floor. This noise does not ā€œsettleā€ well when you try to cram long, descriptive prompts into the dataset.

• Avoiding Semantic Noise

Heavy captions introduce unnecessary noise into the training tokens. When the model tries to resolve that high initial noise against a wall of text, it often leads to:

Disfigured faces

Loss of fine detail

• Leveraging Latent Knowledge

You aren’t teaching the model what clothes or backgrounds are, it already knows. By keeping the caption to a single word, you focus 100% of the training energy on aligning your subject’s unique features with the model’s existing 6B-parameter intelligence.

• Style Versatility

This is how you keep the model flexible.

Because you haven’t ā€œbakedā€ specific descriptions into the character, you can drop them into any style, even a cartoon. and the model will adapt the character perfectly without breaking.

original post with discussion

Additionally, here is full fp32 model merge:

Full fp32 model here : https://civitai.com/models/2266472?modelVersionId=2551132

Credit for:

Tongyi-MAI For the ABSOLUTE UNIT OF A MODEL

Ostris And his Absolute legend of A training tool and Adapter

ClownsharkBatwing For the amazing RES4LYFE SAMPLERS

erosDiffusion For Revealing Flowmatch Scheduler

17