Type | |
Stats | 43 0 |
Reviews | (3) |
Published | Feb 25, 2025 |
Base Model | |
Training | Epochs: 1 |
Hash | AutoV2 02BACB2A85 |
*Important, please read this before using the model because this is very experimental, most of the time the result are mediocre
Miso Diffusion M (Alpha) is an attempt to fine tune stable diffusion 3.5 medium on anime dataset. In comfy ui it uses as little as 2.4 gb vram without the t5 text encoder. This version is fine tuned with 250k image with just 1epoch. The model with larger dataset is still in preparation but faces some challenge. During my initial attempt to fine tune with 700k image, the model would collapse after 2000 steps, it is unclear why as I find the first training run some what promising. Other issue includes the nature of SD3.5 which aim to provide diverse output meaning even if you use the same seed it will have very different outcome, so it remains unclear if it can accurately generate the characters.
Recommanded setting, euler, cfg:4 , 24-28 steps, though dpm ++ 2m also works but haven't done much testing, prompt: danbooru style tagging. The model is also very picky with prompt, I recommand simply generating with a batchsize of 4 and pick the best one.
You can download the text encoder or model at other stages here, -000001 represents the model with only 1 epoch. https://huggingface.co/suzushi/miso-diffusion-m-test
Quality tag
Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality
Aesthetic Tag
Very Aesthetic, aesthetic
Pleasent
Very pleasent, pleasent, unpleasent
Additional tag: high resolution, elegant
Training setting: Adafactor with a batchsize of 40, lr_scheduler: constant with warm up
SD3.5 Specific setting:
enable_scaled_pos_embed = true
pos_emb_random_crop_rate = 0.2
weighting_scheme = "flow"
Most SD3.5 M training so far freeze all text encoder, I tried to train clip with t5 cached, but its remains unclear if this is the reason why the subsequent training failed?