Miso-diffusion-M-1.0

Name: Miso-diffusion-M-1.0
Rating: 0 (0 reviews)
Author: suzushi2024

Updated: Mar 3, 2025

base model

Download (4.76 GB)

Verified: 3 months ago

SafeTensor

Details

Type	Checkpoint Trained
Stats	78 0
Reviews	Positive (11)
Published	Mar 3, 2025
Base Model	SD 3.5 Medium
Training	Epochs: 2
Hash	AutoV2 629D9F8874

1 File

default creator card background decoration

suzushi2024

License:

Stability AI Community License Agreement

*Important, please read this before using the model because this is very experimental, as I am still trying to fine the optimal settings, and will slowly exit beta as the training become more stable

You can download the clip text encoder here: https://huggingface.co/suzushi/miso-diffusion-m-1.0

I will write 2 articles soon as well on the details of the model.

Miso Diffusion M 1.0 is an attempt to fine tune stable diffusion 3.5 medium on anime dataset. In comfy ui it uses as little as 2.4 gb vram without the t5 text encoder. This version is a step up from previous version (beta), trained on the same 160k image for 3 more epoch then fine tuned on 600k images for another 2 epoch. (2 was choosen as further training would cause it to generate more artifact and blurry images)

Recommanded setting, euler, cfg:5 , 28-40 steps, (denoise: 0.95 or 1 )

prompt: danbooru style tagging. I recommand simply generating with a batch size of 4 to 8 and pick the best one. It will struggle with hands and complex pose, you can add upper body so it doesnt generate full body.

Quality tag

Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality

Aesthetic Tag

Very Aesthetic, aesthetic

Pleasent

Very pleasent, pleasent, unpleasent

Additional tag: high resolution, elegant

Training was done in 1024x1024, though since the model natively supports 1440, certain prompt would work on 1440x1440 as well

Training is done on gh200 with 96gb vram

Training setting: Adafactor with a batchsize of 40, lr_scheduler: cosine

SD3.5 Specific setting:

enable_scaled_pos_embed = true

pos_emb_random_crop_rate = 0.2

weighting_scheme = "flow"

learning_rate = 3e-6

learning_rate_te1 = 2e-6

learning_rate_te2 = 2e-6

Train Clip: true, Train t5xxl: false

Developing a base model is costly, so if you like my

work please consider donation, thanks a lot: https://ko-fi.com/suzushi2024