Full video here:
I just finished a systematic training study for Flux 2 Klein and wanted to share what I learned. The goal was to train an analog film aesthetic LoRA (grain, halation, optical artifacts, low-latitude contrast)
I came out with two versions of the Klein models I was training Flux 2 Klein, a 3K step version with more artifacts/flares and a 7K step version with better subject fidelity. As well as a version for the dev model. Free on Civitai. But the interesting part is the research.
https://civitai.com/models/691668/herbst-photo-analog-film

Methodology
50+ training runs using AI Toolkit, changing one parameter per run to get clean A/B comparisons. All tests used the same dataset (my own analog photography) with simple captions. Most of the tests were conducted with the Dev model, though when I mirrored the configs for Klein-9b , I observed the same patterns. I tested on thousands of image generations not covered in this reasearch as I will only touch on what I found was the most noteworthy. *I'd also like to mention that the training configs are only 1 of three parts of this process. The training data is the most important; I won't cover that here, as well as the sampling settings when using the model
For each test, I generated two images:
A prompt pulled directly from training data (can the model recreate what it learned?)
"Dog on a log" , tokens that don't exist anywhere in the dataset (can the model transfer style to new prompts?)
The second test is more important. If your LoRA only works on prompts similar to training data, it's not actually learning a style, it's memorizing.

Example of the two prompts A/B testing format. The top row is the default AI toolkit config, bottom row is A/B parameter changes (in this case, network dimension ratio variation)
Scheduler/Sampler Testing
Before touching any training parameters, I tested every combination of scheduler and sampler in the K sampler. ~300 combinations.
Winner for filmic/grain aesthetic: dpmpp_2s_ancestral + sgm_uniform
This isn't universal; if you want clean digital output or animation, your optimal combo will be different. But for analog texture, this was clearly the best.

My top picks from testing every scheduler and sampler combo
Key Parameter Findings
Network Dimensions
Winner:
128, 64, 64, 32(linear, linear_alpha, conv, conv_alpha) **if you want some secret sauce: something I found across every base model I have trained on is that this combo is universally strong for training style LoRAs of any intent. Many other parameters have effects that are subject to the goal of the user and their taste.

Past this = diminishing returns
Cranking all to 256 = images totally destroyed (honestly, it looks coo,l and it made me want to make some experimental models that are designed for extreme degradation and I'd like to test further, but for this use case: unusable)

256 universal rank degredationon the lower right images
Decay
Lowering decay by 10x from the default improved grain pickup and shadow texture. This is a parameter that had a huge enhancement in the low noise learning of grain patterns, but for illustrative and animation models, I would recommend the opposite, to increase this setting.
Highlights bloomed more naturally with visible halation
This was one of the biggest improvements

Decay lowered 5x (bottom) for the Dev model
Lower decay (left):
Lifted black point
RGB channels bleed into each other
Less saturated, more washed-out look
Higher decay (right):
Deeper blacks
More channel separation
Punchier saturation, more contrast
Neither end is "correct". It's about understanding that these parameter changes, though mysterious computer math under the hood, produce measurable differences in the output. The waveform shows it's not placebo; decay has a real, visible effect on black point, channel separation, and saturation.

Far left - low decay, far right, high decay.
Timestep Type
Tested sigmoid, linear, shift
Shift gave interesting outputs but defaults (balanced) were better overall for this look. I've noticed when training anime / illustrative LoRAs that training with Shift increased the prevalence of the brush strokes and medium-level noise learning.

FP32 vs FP8 Training
For Flux 2 Klein specifically, FP8 training produced better film grain texture
Non-FP8 had better subject fidelity but the texture looked neural-network-generated rather than film-like
This might be model-specific, on others I found training with the dtype of fp32 gave a noticeably higher fidelity. (training time increases nearly 10x, though, it's often not worth the squeeze to test until the final iterations of the fine-tune)
Step Count
All parameter tests run at 3K steps (good enough to see if the config is working without burning compute).
Once I found a winning config (v47), I tested epochs from 1K → 10K+ steps:
3K steps: More optical artifacts, lens flares, aggressive degradation
7K steps (dev winner): Better subject retention while keeping grain, bloom, tinted shadows
Past 7k steps was a noticeable spike in degradation to the point of anatomical distortion that was not desirable.
I'm releasing both

testing v47 of the dev model 1-10k steps at epochs every 250 steps. (1-8k depicted here)
If you care to try any of the modes:
Recommended settings:
Trigger word:
HerbstPhotoLoRA strength: 0.73 sweet spot (0.4-0.75 balanced, 0.8-1.0 max texture)
Sampler:
dpmpp_2s_ancestral+sgm_uniformResolution: up to 2K
Happy to answer questions about methodology or specific parameter choices.
