Type | |
Stats | 38 0 |
Reviews | (5) |
Published | Feb 27, 2025 |
Base Model | |
Training | Epochs: 3 |
Hash | AutoV2 D7D7F41A14 |
*Important, please read this before using the model because this is very experimental, as I am still trying to fine the optimal settings
You can download the clip text encoder here: https://huggingface.co/suzushi/miso-diffusion-m-beta
Miso Diffusion M (Beta) is an attempt to fine tune stable diffusion 3.5 medium on anime dataset. In comfy ui it uses as little as 2.4 gb vram without the t5 text encoder. This version is a step up from previous version (alpha) , trained on 160k image for 3 epoch to see how it adapts to anime.
Recommanded setting, euler, cfg:5 , 28-40 steps, though dpm ++ 2m also works but haven't done much testing, prompt: danbooru style tagging. I recommand simply generating with a batch size of 4 to 8 and pick the best one.
Quality tag
Masterpiece, Perfect Quality, High quality, Normal Quality, Low quality
Aesthetic Tag
Very Aesthetic, aesthetic
Pleasent
Very pleasent, pleasent, unpleasent
Additional tag: high resolution, elegant
Training is done on gh200. Switched lr scheduler to cosine this time
Training setting: Adafactor with a batchsize of 40, lr_scheduler: cosine
SD3.5 Specific setting:
enable_scaled_pos_embed = true
pos_emb_random_crop_rate = 0.2
weighting_scheme = "flow"
Train Clip: true, Train t5xxl: false