physical violence

revealing clothes

weapon violence

pg-13

corpses

wide hips

convenient censoring

oral invitation

thick thighs

huge breasts

downblouse

suggestive

sexy

pg13

sexual situations

disturbing

male nudity

female swimwear or underwear

male swimwear or underwear

partial nudity

graphic violence or gore

emaciated bodies

exposed female nipple

female nudity

undressed

male underwear

female swimwear

female underwear

breasts out

strapless leotard

breast out

one breast out

gigantic breasts

huge butt

covered nipples

hair over breasts

no panties

sitting on face

nude

lingerie

nsfw

adult toys

nudity

graphic male nudity

illustrated explicit nudity

graphic female nudity

sexual intent

genitals

porn

futanari

hentai

peeing

blowjob

sexual activity

vore

anal

dildo riding

oral

incest

hanging

hate symbols

nazi party

white supremacy

self injury

extremist

hate speech

diapers

urine

scat

child on child

bukkake

fellatio

bikini

cumshot

implied fellatio

eat_cum

cumdrip

cum in pussy

cum on face

after fellatio

cum on hair

cum on body

cum on tongue

cum on hands

cum in mouth

Welcome to Terminus Relay!This model series is the culmination of millions of training steps burnt into the Stable Diffusion 3.5 abyss. The name Relay refers to the hand-off of the baton in the race to build a usable SD 3.5 Medium ecosystem.Currently, a 3.5 Medium version is available in LyCORIS LoKr format. This adapter is reasonably small at 350MiB and so it should run on most consumer hardware!The 3.5 Medium model was selected due to the truly challenging nature of training it, and the promising potential in a 2.6B parameter 16ch VAE model. If we could get it to work, well, great things could come from this foundation!The v1 version of Terminus Relay has roughly 55,000 steps of finetuning on very high quality photos, cinematic still extracts, and typography images to improve the model's understanding of text.The meme potential of this model is high!For v2, it has reached 295,000 steps of finetuning on mostly high quality images containing a lot of text (signs, paper handwriting, etc) and ~28k stock photos pulled from a pre-AI era dataset (none of these had watermarks).The dataset size was actually reduced between v1 and v2 to focus the model in more for composition and prompt adherence than direct anatomical improvements.The v1 model may be more creative but the v2 model is more stable. The earlier version requires a higher CFG around 5-8 but the newer ones will require lower CFG from 2-4.<h2 id="training-details-1ozj4sy51">Training details</h2><h3 id="simpletuner-configuration-1n1kg3pym">SimpleTuner configuration</h3>This goes into <code>config/sd3/config.json</code><pre><code>{
 "--resume_from_checkpoint": "latest",
 "--quantize_via": "cpu",
 "--data_backend_config": "config/sd3/multidatabackend.json",
 "--aspect_bucket_rounding": 2,
 "--seed": 42,
 "--minimum_image_size": 0,
 "--disable_benchmark": false,
 "--output_dir": "output/sd3",
 "--lora_type": "lycoris",
 "--lycoris_config": "config/sd3/lycoris_config.json",
 "--max_train_steps": 300000,
 "--num_train_epochs": 0,
 "--checkpointing_steps": 5000,
 "--checkpoints_total_limit": 5,
 "--hub_model_id": "sd35m-photo-1mp",
 "--push_to_hub": "true",
 "--push_checkpoints_to_hub": "true",
 "--tracker_project_name": "lora-training",
 "--tracker_run_name": "sd35m-1mp",
 "--report_to": "wandb",
 "--model_type": "lora",
 "--pretrained_model_name_or_path": "stabilityai/stable-diffusion-3.5-medium",
 "--model_family": "sd3",
 "--train_batch_size": 4,
 "--gradient_checkpointing": "true",
 "--gradient_accumulation_steps": 1,
 "--caption_dropout_probability": 0.1,
 "--resolution_type": "pixel_area",
 "--skip_file_discovery": false,
 "--resolution": 1024,
 "--validation_seed": 42,
 "--validation_steps": 5000,
 "--validation_resolution": "1024x1024",
 "--validation_negative_prompt": "ugly, cropped, blurry, low-quality, mediocre average",
 "--validation_guidance": 6.0,
 "--validation_guidance_rescale": "0.0",
 "--validation_num_inference_steps": "30",
 "--validation_prompt": "A photo-realistic image of a cat",
 "--mixed_precision": "bf16",
 "--optimizer": "bnb-adamw8bit",
 "--learning_rate": "5e-5",
 "--max_grad_norm": 0.1,
 "--grad_clip_method": "value",
 "--lr_scheduler": "constant_with_warmup",
 "--lr_warmup_steps": 10000,
 "--base_model_precision": "int8-quanto",
 "--vae_batch_size": 1,
 "--validation_torch_compile": "true",
 "--validation_lycoris_strength": 1.0,
 "--webhook_config": "config/sd3/webhook.json",
 "--compress_disk_cache": "false",
 "--evaluation_type": "clip",
 "use_ema": true,
 "ema_validation": "comparison",
 "ema_update_interval": 25,
 "--delete_problematic_images": "true",
 "--disable_bucket_pruning": true,
 "--lora_rank": 128,
 "--lora_alpha": 128,
 "--flux_schedule_shift": 3,
 "--validation_prompt_library": true
}</code></pre>If you wish to continue finetuning this model in particular, use <code>--init_lora=/path/to/file.safetensors</code>Place the following into <code>config/sd3/lycoris_config.conf</code><pre><code>{
 "bypass_mode": true,
 "algo": "lokr",
 "multiplier": 1.0,
 "full_matrix": true,
 "linear_dim": 10000,
 "linear_alpha": 1,
 "factor": 4,
 "apply_preset": {
 "target_module": [
 "Attention",
 "FeedForward"
 ],
 "module_algo_map": {
 "FeedForward": {
 "factor": 4
 },
 "Attention": {
 "factor": 2
 }
 }
 }
}
</code></pre>And for the dataset:<ul><li>154 T5 tokens</li><li>77 CLIP tokens</li><li>Resolution ~1024px area aspect bucketed data</li><li>CogVLM and other language model created captions</li><li>No particular focus on NSFW, anime; only high quality photo data</li></ul>

Terminus Relay