Issue
It was recently discovered that the t5_max_length
setting for Stable Diffusion 3 models (Large and Medium) is not 256, as previously documented. Instead, the correct value is 154. This issue is critical because the 256 limit refers to the total sequence length for all text encoders in SD3.5 models, not just the T5 encoder.
Failing to set the correct t5_max_length
has likely led to sub-optimal training results for many models trained so far, with degraded likeness and inefficiencies.
Implications of for Existing Models
Models trained without this fix may require retraining for optimal performance.
This discovery impacts all previously trained LoRAs and fine-tuned models, potentially invalidating earlier assumptions about training quality.
Fix
Add the parameter in your configuration file for SD3.5 training.
For ai-toolkit it is around line 66 in the train_lora_sd35_large_24gb.yaml
file:
model:
# huggingface model name or path
name_or_path: "stabilityai/stable-diffusion-3.5-large"
is_v3: true
quantize: true # run 8bit mixed precision
t5_max_length = 154
This works with SD3.5 Medium as well and you can train it the same way by just changing the name or path to a Stable Diffusion 3.5 Medium checkpoint such as:
model:
# huggingface model name or path
name_or_path: "stabilityai/stable-diffusion-3.5-medium"
is_v3: true
quantize: true # run 8bit mixed precision
t5_max_length = 154
If you are using SimpleTuner this change already was implemented about two weeks ago. Make sure your up to date with the repo.
I have yet to test with kohya_ss although there is no indication that the improvement would not apply to it as well. I will be sure to update this once I can confirm and will list the required change.
Why This Matters and Next Steps
Setting t5_max_length=154
aligns your training with the actual architecture of Stable Diffusion 3 models. This simple adjustment reduces degradation, improves likeness retention, and ensures better overall fidelity in your outputs.
If you’ve avoided training SD3.5 due to its challenges, this fix makes it worth revisiting. Update your training pipelines to include t5_max_length=154
and consider reevaluating any models trained under the incorrect assumption of 256 tokens.
This small change has a big impact—now’s the time to unlock the full potential of SD3.5!
For more details, check:
Reddit post from terminusresearchorg explaining the issue:
“The T5 text encoder previously claimed to use a sequence length of 256 is now understood to use a sequence length of 154. Updating this results in more likeness being trained into the model with less degradation.”GitHub Pull Request from SimpleTuner creator bghira: Highlights that "256 tokens is total, not just T5."