Type | |
Stats | 229 0 |
Reviews | (13) |
Published | Mar 6, 2024 |
Base Model | |
Training | Epochs: 40 |
Hash | AutoV2 D41A1BC696 |
The first high quality Anime style on Cascade is here. SomniumSC goal is to be waifu diffusion of Stable Cascade. Diffuser version could found in our huggingface too
On CivitAI, there is 2 file each weight size, which is fine tuned stage C, and fine tuned text encoder (Which is in zip). You should download both of them and extract the zip file to get .safetensors, so the model can be used on ComfyUI, the instruction is below. If you want to use our model in diffusers 🧨. Check our repo in huggingface, there is a code how to use it
Says goodbye for negative prompt and "word salad" in your positive prompt or hassle captioning. Start from SomniumSC v1.1, you don't need any prompt adjustment to generate stunning images and captioning is much simpler. Our model can generate good image even when no negative prompt on it. You can use negative prompt when there is undesired items on image like elf ear, or random halo.
You can support me on Ko-Fi
__________________________________________________________________________________________________
SomniumSC is fine-tuned model from all new stabilityAI model, Stable Cascade (Or we could say Würstchen v3) with a 2D (cartoonish) style is trained at Stage C 3.6B model. This model also trains the text encoder to generate a 2D style, so this model not only could generate using booru tag prompt, also you can use the natural language.
The model uses same amount and method of AnySomniumXL v2 used which has 33,000+ curated images from hundreds of thousands of images from various sources. The dataset is built by saving images that have an aesthetic score of at least 19 and a maximum of 50 (to maintain the cartoonish model and not too realistic. The scale is based on our proprietary aesthetic scoring mechanism), and do not have text and watermarks such as signatures or comic/manga images. Thus, images that have an aesthetic score of less than 17 and more than 50 will be discarded, as well as images that have watermarks or text will be discarded.
SomniumSC Technical Specification:
Training per 1 Epoch 40 Epoch (Results from SomniumSC using Epoch 40)
Captioned by proprietary multimodal LLM, better than LLaVA
Trained with a bucket size of 1024x1024; 1536;1536 (Multi res)
Shuffle Caption: Yes
Clip Skip: 0
Trained with 1x NVIDIA A100 80GB
The technology for creating this dataset uses a combination of the CLIP model and MLP scoring method by christophschuhmann and modified by us, utilizing VIT-L/14 to produce aesthetic scoring on a scale of -1-100 and modified with the addition of watermark detection from us.
Achievements in SomniumSC v1.1:
✓ Produces more 2D Models with Natural Language by default without the need for excessive negative or positive prompts
✓ Most likely to produce better fingers than the average stable diffusion model without adetailer or inpainting
✓ Produces a more authentic 2D model without the need for negative prompts like realistic
✓ Does not produce images with random watermarks or text
✓ Can produce better text even than AnySomniumXL v3.5.1
✓ Goodbye to “negative prompts”. You no longer need to use a negative prompt to prevent bad images unless there is an unwanted object
✓ Produces better colour than SomniumSC v1
✓ Much simple captioning
The difference between Stable Cascade and SDXL based model was, the model produce better finger, better hand, better feet, better fine detail of the characters, holding objects much better, and can generate up to 1536px. If you dare, you can generate using this model up to 2048px.
Limitations:
✓ Still requires broader dataset training for more variation of poses and style
✓ Text maximum words is only 2
✓ This optimized for human or mutated human generation. Non human like SCP, Ponies, and more maybe could resulting not what you expecting
✓ The faces maybe looks compressed. Generate the image at 1536px could be better
Smaller half size and stable cascade lite version will be released soon
How to use SomniumSC:
Currently Stable Cascade only supported by ComfyUI. But you can use our demo
You can use tutorial in here or here
To simplify which model should you download, I will provide you the where's to download model directly
For stage A you can download from here
For stage B you can download from here
For stage C you can download the safetensors on CivitAI or our huggingface repo
And the text encoder you download from our huggingface repo
SomniumSC Pro tips:
If the model producing pointy ears on the character, just add elf
or pointy ears
.
If the model producing "Compressed Face" use 1536px resolution, so the model can produce the face clearly.
Disclaimer:
This model is under STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE. Which this model cannot be sold, and the derivative works cannot be commercialized. Except As far as I know, you can buy the membership of StabilityAI here To commercialize your derivative works based on this model. Please support StabilityAI, so they can always provide open source model for us. But still you can merge our model freely