Sign In

Terminus Relay

22
99
0
1
Verified:
SafeTensor
Type
LyCORIS
Stats
47
0
Reviews
Published
Jan 20, 2025
Base Model
SD 3.5 Medium
Training
Steps: 295,000
Epochs: 3
Usage Tips
Strength: 1
Hash
AutoV2
EAD76FE90E
default creator card background decoration
ptx0's Avatar
ptx0
This Stability AI Model is licensed under the Stability AI Community License, Copyright (c) Stability AI Ltd. All Rights Reserved.
Powered by Stability AI

Welcome to Terminus Relay!

This model series is the culmination of millions of training steps burnt into the Stable Diffusion 3.5 abyss. The name Relay refers to the hand-off of the baton in the race to build a usable SD 3.5 Medium ecosystem.

Currently, a 3.5 Medium version is available in LyCORIS LoKr format. This adapter is reasonably small at 350MiB and so it should run on most consumer hardware!

The 3.5 Medium model was selected due to the truly challenging nature of training it, and the promising potential in a 2.6B parameter 16ch VAE model. If we could get it to work, well, great things could come from this foundation!

The v1 version of Terminus Relay has roughly 55,000 steps of finetuning on very high quality photos, cinematic still extracts, and typography images to improve the model's understanding of text.

The meme potential of this model is high!

For v2, it has reached 295,000 steps of finetuning on mostly high quality images containing a lot of text (signs, paper handwriting, etc) and ~28k stock photos pulled from a pre-AI era dataset (none of these had watermarks).

The dataset size was actually reduced between v1 and v2 to focus the model in more for composition and prompt adherence than direct anatomical improvements.

The v1 model may be more creative but the v2 model is more stable. The earlier version requires a higher CFG around 5-8 but the newer ones will require lower CFG from 2-4.

Training details

SimpleTuner configuration

This goes into config/sd3/config.json

{
  "--resume_from_checkpoint": "latest",
  "--quantize_via": "cpu",
  "--data_backend_config": "config/sd3/multidatabackend.json",
  "--aspect_bucket_rounding": 2,
  "--seed": 42,
  "--minimum_image_size": 0,
  "--disable_benchmark": false,
  "--output_dir": "output/sd3",
  "--lora_type": "lycoris",
  "--lycoris_config": "config/sd3/lycoris_config.json",
  "--max_train_steps": 300000,
  "--num_train_epochs": 0,
  "--checkpointing_steps": 5000,
  "--checkpoints_total_limit": 5,
  "--hub_model_id": "sd35m-photo-1mp",
  "--push_to_hub": "true",
  "--push_checkpoints_to_hub": "true",
  "--tracker_project_name": "lora-training",
  "--tracker_run_name": "sd35m-1mp",
  "--report_to": "wandb",
  "--model_type": "lora",
  "--pretrained_model_name_or_path": "stabilityai/stable-diffusion-3.5-medium",
  "--model_family": "sd3",
  "--train_batch_size": 4,
  "--gradient_checkpointing": "true",
  "--gradient_accumulation_steps": 1,
  "--caption_dropout_probability": 0.1,
  "--resolution_type": "pixel_area",
  "--skip_file_discovery": false,
  "--resolution": 1024,
  "--validation_seed": 42,
  "--validation_steps": 5000,
  "--validation_resolution": "1024x1024",
  "--validation_negative_prompt": "ugly, cropped, blurry, low-quality, mediocre average",
  "--validation_guidance": 6.0,
  "--validation_guidance_rescale": "0.0",
  "--validation_num_inference_steps": "30",
  "--validation_prompt": "A photo-realistic image of a cat",
  "--mixed_precision": "bf16",
  "--optimizer": "bnb-adamw8bit",
  "--learning_rate": "5e-5",
  "--max_grad_norm": 0.1,
  "--grad_clip_method": "value",
  "--lr_scheduler": "constant_with_warmup",
  "--lr_warmup_steps": 10000,
  "--base_model_precision": "int8-quanto",
  "--vae_batch_size": 1,
  "--validation_torch_compile": "true",
  "--validation_lycoris_strength": 1.0,
  "--webhook_config": "config/sd3/webhook.json",
  "--compress_disk_cache": "false",
  "--evaluation_type": "clip",
  "use_ema": true,
  "ema_validation": "comparison",
  "ema_update_interval": 25,
  "--delete_problematic_images": "true",
  "--disable_bucket_pruning": true,
  "--lora_rank": 128,
  "--lora_alpha": 128,
  "--flux_schedule_shift": 3,
  "--validation_prompt_library": true
}

If you wish to continue finetuning this model in particular, use --init_lora=/path/to/file.safetensors

Place the following into config/sd3/lycoris_config.conf

{
  "bypass_mode": true,
  "algo": "lokr",
  "multiplier": 1.0,
  "full_matrix": true,
  "linear_dim": 10000,
  "linear_alpha": 1,
  "factor": 4,
  "apply_preset": {
    "target_module": [
      "Attention",
      "FeedForward"
    ],
    "module_algo_map": {
      "FeedForward": {
        "factor": 4
      },
      "Attention": {
        "factor": 2
      }
    }
  }
}

And for the dataset:

  • 154 T5 tokens

  • 77 CLIP tokens

  • Resolution ~1024px area aspect bucketed data

  • CogVLM and other language model created captions

  • No particular focus on NSFW, anime; only high quality photo data