Type | |
Stats | 332 |
Reviews | (39) |
Published | Jun 10, 2024 |
Base Model | |
Training | Steps: 50,000 Epochs: 3 |
Hash | AutoV2 C891D98D86 |
New version is out: https://civitai.com/models/628865/sotediffusion-v2
Anime finetune of Würstchen V3.
This release is sponsored by fal.ai/grants
Trained on 6M images for 3 epochs using 8x A100 80G GPUs.
This model can be used via API with Fal.AI
For more details: https://fal.ai/models/fal-ai/stable-cascade/sote-diffusion
Please refer to Huggingface for SD.Next UI, Diffusers or UNet models:
https://huggingface.co/Disty0/sotediffusion-wuerstchen3
CivitAI page has only the ComfyUI checkpoint models.
Inference Parameters:
Download the Main model (8.14 GB file):
https://civitai.com/api/download/models/563950?type=Model&format=SafeTensor&size=pruned&fp=fp16
Download the Decoder model (4.24 GB file):
https://civitai.com/api/download/models/563892?type=Model&format=SafeTensor&size=pruned&fp=fp16
Positives:
newest, extremely aesthetic, best quality,
Negatives:
very displeasing, worst quality, monochrome, realistic, oldest, loli,
Main:
Sampler: DDPM or DPMPP 2M with SGM Uniform
CFG: 7
Steps: 30 or 40
Decoder:
Sampler: Euler a Karras
CFG: 1 or 1.2
Steps: 10
Compression: 42 (or 32 to 64)
Resolution: 1024x1536, 2048x1152.
Anything works as long as it's a multiply of 128.
Training:
Software used: Kohya SD-Scripts with Stable Cascade branch.
https://github.com/kohya-ss/sd-scripts/tree/stable-cascade
GPU used: 8x Nvidia A100 80GB
GPU hours: 220
Base
parameters | value
amp | bf16
weights | fp32
save weights | fp16
resolution | 1024x1024
effective batch size | 128
unet learning rate | 1e-5
te learning rate | 4e-6
optimizer | Adafactor
images | 6M
epochs | 3
Final
parameters | value
amp | bf16
weights | fp32
save weights | fp16
resolution | 1024x1024
effective batch size | 128
unet learning rate | 4e-6
te learning rate | none
optimizer | Adafactor
images | 120K
epochs | 16
Dataset:
GPU used for captioning: 1x Intel ARC A770 16GB
GPU hours: 350
Model used for captioning: SmilingWolf/wd-swinv2-tagger-v3
Model used for text: llava-hf/llava-1.5-7b-hf
Command:
python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./
dataset name | total images
newest : 1.85M
recent : 1.38M
mid : 993K
early : 566K
oldest : 160K
pixiv : 344K
visual novel cg : 231K
anime wallpaper : 105K
Total: 5.628.499 images
Note:
Smallest size is 1280x600 / 768.000 pixels
Deduped based on image similarity using czkawka-cli
Around 120K very high quality images got intentionally duplicated 5 times, making the total image count 6.2M
Tags:
Tag Format:
Model is trained with random tag order but this is the order in the dataset if you are interested:
aesthetic tags, quality tags, date tags, custom tags, rating tags, character, series, rest of the tags
Date:
newest : 2022 to 2024
recent : 2019 to 2021
mid : 2015 to 2018
early : 2011 to 2014
oldest : 2005 to 2010
Aesthetic Tags:
Model used: shadowlilac/aesthetic-shadow-2
score > 0.90 : extremely aesthetic
score > 0.80 : very aesthetic
score > 0.70 : aesthetic
score > 0.50 : slightly aesthetic
score > 0.40 : not displeasing
score > 0.30 : not aesthetic
score > 0.25 : slightly displeasing
score > 0.10 : displeasing
rest of them : very displeasing
Quality Tags:
Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
score > 0.980 : best quality
score > 0.900 : high quality
score > 0.750 : great quality
score > 0.500 : medium quality
score > 0.250 : normal quality
score > 0.125 : bad quality
score > 0.025 : low quality
rest of them : worst quality
Rating Tags:
general
sensitive
nsfw
explicit nsfw
Custom Tags:
image boards: date,
text: The text says "text",
characters: character, series
pixiv: art by Display_Name,
visual novel cg: Full_VN_Name (short_3_letter_name), visual novel cg,
anime wallpaper: date, anime wallpaper,
License
SoteDiffusion models falls under Fair AI Public License 1.0-SD license, which is compatible with Stable Diffusion models’ license. Key points:
1. Modification Sharing: If you modify SoteDiffusion models, you must share both your changes and the original license.
2. Source Code Accessibility: If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too.
3. Distribution Terms: Any distribution must be under this license or another with similar rules.
4. Compliance: Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values.
Notes: Anything not covered by Fair AI license is inherited from Stability AI Non-Commercial license.