Sign In

Miso-diffusion-M-1.1

12

92

0

Updated: Mar 27, 2025

base modelanime

Verified:

SafeTensor

Type

Checkpoint Trained

Stats

92

0

Reviews

Published

Mar 27, 2025

Base Model

SD 3.5 Medium

Training

Steps: 89,000
Epochs: 5

Hash

AutoV2
817A0E9EBE

This Stability AI Model is licensed under the Stability AI Community License, Copyright (c) Stability AI Ltd. All Rights Reserved.

Powered by Stability AI

*Important, please read this before using the model because this is very experimental, I think the training direction is right but it will struggle with hand and complex pose at the moment.

You can download the clip text encoder here: https://huggingface.co/suzushi/miso-diffusion-m-1.1

Though I merged the clip encoder as well so you can use it directly (without t5).

You can use it like this in comfy ui if you prefer to use t5 :

This version is trained with 710k image for 5 epoch.

Roughly 261 hours, running a epoch cost around 80 dollars, training a base model is costly, so if possible please consider supporting this project, the cost to develop SD3.5 medium model is slightly lower then sdxl, thousand dollar would allow me to train a million image class model for 10 epoch: https://ko-fi.com/suzushi2024, if there's any money left it will go towards lora training in the future.

Development roadmap:

The larger dataset is still in processing, 2.0 aims to be utilize 1.2-1.5 million image dataset and 3.0 aimed at 2.5 million.

Opensource commitment:

This is a little tool I used to prepare my dataset: https://github.com/suzushi-tw/wd14-toolkit

Roughly 570 dollars was spend on processing the danbooru dataset, each tar file comes with image pair with its txt file as well as characters_list, feel free to use the dataset:

https://huggingface.co/suzushi

Recommanded setting, euler, cfg:5 , 32-36 steps, (denoise: 0.95 or 1 )

prompt: danbooru style tagging. I recommand simply generating with a batch size of 4 to 8 and pick the best one.

Quality tag

Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality

Aesthetic Tag

Very Aesthetic, aesthetic

Pleasent

Very pleasent, pleasent, unpleasent

Additional tag: high resolution, elegant

Slight increase in learning rate, the model is far from finished,

every epoch still shows significant improvement.

Training setting: Adafactor with a batchsize of 40, lr_scheduler: cosine

SD3.5 Specific setting:

enable_scaled_pos_embed = true

pos_emb_random_crop_rate = 0.2

weighting_scheme = "flow"

learning_rate = 3.5e-6

learning_rate_te1 = 2.5e-6

learning_rate_te2 = 2.5e-6

Train Clip: true, Train t5xxl: false