home models images videos posts articles bounties challenges events updates shop

Miso-diffusion-M (Alpha)

Name: Miso-diffusion-M (Alpha)
Rating: 0 (0 reviews)
Author: suzushi2024

Updated: Feb 25, 2025

base model

Download (4.76 GB)

Verified: 4 months ago

SafeTensor

Details

Type	Checkpoint Trained
Stats	51 0
Reviews	Positive (4)
Published	Feb 25, 2025
Base Model	SD 3.5 Medium
Training	Epochs: 1
Hash	AutoV2 02BACB2A85

1 File

default creator card background decoration

suzushi2024

License:

Stability AI Community License Agreement

*Important, please read this before using the model because this is very experimental, most of the time the result are mediocre

Miso Diffusion M (Alpha) is an attempt to fine tune stable diffusion 3.5 medium on anime dataset. In comfy ui it uses as little as 2.4 gb vram without the t5 text encoder. This version is fine tuned with 250k image with just 1epoch. The model with larger dataset is still in preparation but faces some challenge. During my initial attempt to fine tune with 700k image, the model would collapse after 2000 steps, it is unclear why as I find the first training run some what promising. Other issue includes the nature of SD3.5 which aim to provide diverse output meaning even if you use the same seed it will have very different outcome, so it remains unclear if it can accurately generate the characters.

Recommanded setting, euler, cfg:4 , 24-28 steps, though dpm ++ 2m also works but haven't done much testing, prompt: danbooru style tagging. The model is also very picky with prompt, I recommand simply generating with a batchsize of 4 and pick the best one.

You can download the text encoder or model at other stages here, -000001 represents the model with only 1 epoch. https://huggingface.co/suzushi/miso-diffusion-m-test

Quality tag

Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality

Aesthetic Tag

Very Aesthetic, aesthetic

Pleasent

Very pleasent, pleasent, unpleasent

Additional tag: high resolution, elegant

Training setting: Adafactor with a batchsize of 40, lr_scheduler: constant with warm up

SD3.5 Specific setting:

enable_scaled_pos_embed = true

pos_emb_random_crop_rate = 0.2

weighting_scheme = "flow"

Most SD3.5 M training so far freeze all text encoder, I tried to train clip with t5 cached, but its remains unclear if this is the reason why the subsequent training failed?