Model Details
A continuation of the NoobAI Flux2 VAE experiment
More info on supporting us: click me
Model Description
Resumed for 4 more epochs, model has shown a nice improvement. We observe good convergence to new details, that were hard to achieve on prior arch. Compositions and stability are strongly improved relative to Epoch 2, as well as downstream trainability (like LoRAs).
Current state is usable for normal generations, so we encourage you to try it. We will provide an easy node for ComfyUI, as well as basic workflow. If you are an A1111 user, please use ReForge, it has native support, instructions will be below.
Once again, we are working with limited compute, but are quite happy with the result so far, and hope to continue working on the model.
Developed by: Cabal Research (Bluvoll, Anzhc)
Funded by: Community
License: fair-ai-public-license-1.0-sd
Finetuned from model: NoobAI Flux2 VAE experiment
Bias and Limitations
Once again, we are limited in budget for this fundamental task. We have adapted enough to have it output somewhat acceptable images (Closer to a theoretical NoobAI 0.1's knowledge using Flux 2 VAE), but further progress would require large compute, as we are in territory where model is simply seeing the new level of details for the first time(as well as old level of details in a new way), and it is hard.
Most biases of official dataset will apply(Blue Archive, etc.).
Expect noise, fuzzy details, low performance in landscape aspect ratio, bad hands and generally issues with composition as a whole.
Model Output Examples
One of the benefits we have achieved is color:
Due to being native flow model, it achieves strong colors, while not making them acidic, or otherwise unstable.
Generally, as already stated, expect at least some grain and fuzzyness in all gens, as we have not converged to the juicy details yet.
Comfy
(Workflow is available alongside model in repo) We will provide a Node, and hope it will be adapted natively in main repo eventually:
https://github.com/Anzhc/SDXL-Flux2VAE-ComfyUI-Node
Same as your normal inference, but with addition of SD3 sampling node, as this model is Flow-based.
Recommended Parameters:
Sampler: Euler, Euler A, DPM++ SDE, etc.
Steps: 20-28
CFG: 6-9
Schedule: Normal/Simple/SGM Uniform/Quadratic
Positive Quality Tags: masterpiece, best quality
Negative Tags: worst quality, normal quality, bad anatomy
A1111 WebUI
(All screenshots are repeating our RF release, as there is no difference in setup)
Recommended WebUI: ReForge - has native support for Flow models, and we've PR'd our native support for Flux2vae-based SDXL modification.
How to use in ReForge:
(ignore Sigma max field at the top, this is not used in RF)
Support for RF in ReForge is being implemented through a built-in extension:
Set parameters to that, and you're good to go.
Flux2VAE does not currently have an appropriate high quality preview method, please use Approx Cheap option, which would allow you to see simple PCA projection(ReForge).
Recommended Parameters:
Sampler: Euler A Comfy RF, Euler, DPM++ SDE Comfy, etc. ALL VARIANTS MUST BE RF OR COMFY, IF AVAILABLE. In ComfyUI routing is automatic, but not in the case of WebUI.
Steps: 20-28
CFG: 6-9
Schedule: Normal/Simple/SGM Uniform
Positive Quality Tags: masterpiece, best quality
Negative Tags: worst quality, normal quality, bad anatomy
ADETAILER FIX FOR RF: By default, Adetailer discards Advanced Model Sampling extension, which breaks RF. You need to add AMS to this part of settings:
Add: advanced_model_sampling_script,advanced_model_sampling_script_backported to there.
If that does not work, go into adetailer extension, find args.py, open it, replace builtinscripts like this:
Training
Model Composition
(Relative to base it's trained from)
Unet: Same CLIP L: Same, Frozen CLIP G: Same, Frozen VAE: Flux2 VAE
Training Details
(Main Stage Training)
Samples seen(unbatched steps): ~50 million samples seen
Learning Rate: 6e-5 (General Training) and 3e-5 (Aesthetic)
Effective Batch size: ~1400 (86x8 Batch Size, Accumulation 2 )
Precision: Mixed BF16
Optimizer: AdamW8bit with Kahan Summation
Weight Decay: 0.01
Schedule: Constant with warmup
Timestep Sampling Strategy: Logit-Normal -0.2 1.5 (sometimes referred to as Lognorm), Shift 2.5
Text Encoders: Frozen
Keep Token: False
Tag Dropout: 10%
Uncond Dropout: 10%
Shuffle: True
VAE Conv Padding: False
VAE Shift: 0.0760
VAE Scale: 0.6043
Additional Features used: Protected Tags, Cosine Optimal Transport.
Total of 6 epochs on Original NoobAI danbooru data.
LoRA Training
Current stage is trainable, but it is hard to achieve accurate reproduction if subject/content is dependent on small details, as base model did not converge to them yet. My current style training settings (Anzhc):
Learning Rate: tested up to 7.5e-4
Batch Size: 144 (6 real * 24 accum), using SGA(Stochastic Gradient Accumulation) - without SGA I probably would lower accum to 4-8.
Optimizer: Adamw8bit with Kahan summation
Schedule: ReREX (Use REX for simplicity, or Cosine annealing)
Precision: Full BF16
Weight Decay: 0.02
Timestep Sampling Strategy: Logit-Normal(either 0.0 1.0, or -0.2 1.5), Shift 2.5
Dim/Alpha/Conv/Alpha: 24/24/24/24 (Lycoris/Locon)
Text Encoders: Frozen
Optimal Transport: True
Expected Dataset Size: 100 images (Can be even 10, but balance with repeats to roughly this target.)
Epochs: 50
Hardware
Model was trained on cloud 8xH200 node.
Software
Custom fork of SD-Scripts(maintained by Bluvoll)
Acknowledgements
Special Thanks
To a special supporter who singlehandidly sponsored whole run and preferred to stay anonymous
Additional donators
-mfcg
-holo
-dyshidrosis
-remix
-edf
Support
If you wish to support our continuous effort of making waifus 0.2% better, you can do it here:
https://ko-fi.com/bluvoll (Blu, donate here to support training)
https://ko-fi.com/anzhc (Anzhc, non-training, just survival)
BTC: 37fLcfxX5ewhJXnb3T9Qzu9jiSLjVtoUJX
ETH: 0xfdF54655796bf2F5bf75192AeB562F8656c1C39E
Send DM to Blu if you want to donate on another network.
Potential future
Expected Compute Needed: We still consider full run to be in range of 20+ epochs, but no longer think that it is the bare minimum for stable model, as progress with just current 6 epochs has been quite drastic in that regard. 10 epochs are likely a good marker for that.
Dataset: We would love to start processing of the booru data with our in-house classification models to fix some of the glaring issues with the default Danbooru dataset, as well as thorough processing to some of the concepts, but as of now we don't have budget to rent a dedicated server for persistent storage.
Future Training: We have confirmation from Sponsor that we would continue training of the model beyond Epoch 6, but it will resume after a short break.














