Sign In

Miso-diffusion-M-1.0

3
12
1
Updated: Mar 3, 2025
base model
Verified:
SafeTensor
Type
Checkpoint Trained
Stats
12
Reviews
Published
Mar 3, 2025
Base Model
SD 3.5 Medium
Training
Epochs: 2
Hash
AutoV2
629D9F8874
This Stability AI Model is licensed under the Stability AI Community License, Copyright (c) Stability AI Ltd. All Rights Reserved.
Powered by Stability AI

*Important, please read this before using the model because this is very experimental, as I am still trying to fine the optimal settings, and will slowly exit beta as the training become more stable

You can download the clip text encoder here: https://huggingface.co/suzushi/miso-diffusion-m-1.0

I will write 2 articles soon as well on the details of the model.

Miso Diffusion M 1.0 is an attempt to fine tune stable diffusion 3.5 medium on anime dataset. In comfy ui it uses as little as 2.4 gb vram without the t5 text encoder. This version is a step up from previous version (beta), trained on the same 160k image for 3 more epoch then fine tuned on 600k images for another 2 epoch. (2 was choosen as further training would cause it to generate more artifact and blurry images)

Recommanded setting, euler, cfg:5 , 28-40 steps, (denoise: 0.95 or 1 )

prompt: danbooru style tagging. I recommand simply generating with a batch size of 4 to 8 and pick the best one. It will struggle with hands and complex pose, you can add upper body so it doesnt generate full body.

Quality tag

Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality

Aesthetic Tag

Very Aesthetic, aesthetic

Pleasent

Very pleasent, pleasent, unpleasent

Additional tag: high resolution, elegant

Training was done in 1024x1024, though since the model natively supports 1440, certain prompt would work on 1440x1440 as well

Training is done on gh200 with 96gb vram

Training setting: Adafactor with a batchsize of 40, lr_scheduler: cosine

SD3.5 Specific setting:

enable_scaled_pos_embed = true

pos_emb_random_crop_rate = 0.2

weighting_scheme = "flow"

learning_rate = 3e-6

learning_rate_te1 = 2e-6

learning_rate_te2 = 2e-6

Train Clip: true, Train t5xxl: false

Developing a base model is costly, so if you like my

work please consider donation, thanks a lot: https://ko-fi.com/suzushi2024