Sign In

Miso-diffusion-M (Alpha)

3
43
1
Updated: Feb 25, 2025
base model
Verified:
SafeTensor
Type
Checkpoint Trained
Stats
43
0
Reviews
Published
Feb 25, 2025
Base Model
SD 3.5 Medium
Training
Epochs: 1
Hash
AutoV2
02BACB2A85
This Stability AI Model is licensed under the Stability AI Community License, Copyright (c) Stability AI Ltd. All Rights Reserved.
Powered by Stability AI

*Important, please read this before using the model because this is very experimental, most of the time the result are mediocre

Miso Diffusion M (Alpha) is an attempt to fine tune stable diffusion 3.5 medium on anime dataset. In comfy ui it uses as little as 2.4 gb vram without the t5 text encoder. This version is fine tuned with 250k image with just 1epoch. The model with larger dataset is still in preparation but faces some challenge. During my initial attempt to fine tune with 700k image, the model would collapse after 2000 steps, it is unclear why as I find the first training run some what promising. Other issue includes the nature of SD3.5 which aim to provide diverse output meaning even if you use the same seed it will have very different outcome, so it remains unclear if it can accurately generate the characters.

Recommanded setting, euler, cfg:4 , 24-28 steps, though dpm ++ 2m also works but haven't done much testing, prompt: danbooru style tagging. The model is also very picky with prompt, I recommand simply generating with a batchsize of 4 and pick the best one.

You can download the text encoder or model at other stages here, -000001 represents the model with only 1 epoch. https://huggingface.co/suzushi/miso-diffusion-m-test

Quality tag

Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality

Aesthetic Tag

Very Aesthetic, aesthetic

Pleasent

Very pleasent, pleasent, unpleasent

Additional tag: high resolution, elegant

Training setting: Adafactor with a batchsize of 40, lr_scheduler: constant with warm up

SD3.5 Specific setting:

enable_scaled_pos_embed = true

pos_emb_random_crop_rate = 0.2

weighting_scheme = "flow"

Most SD3.5 M training so far freeze all text encoder, I tried to train clip with t5 cached, but its remains unclear if this is the reason why the subsequent training failed?