physical violence

revealing clothes

weapon violence

pg-13

corpses

wide hips

convenient censoring

oral invitation

thick thighs

huge breasts

downblouse

suggestive

sexy

pg13

sexual situations

disturbing

male nudity

female swimwear or underwear

male swimwear or underwear

partial nudity

graphic violence or gore

emaciated bodies

exposed female nipple

female nudity

undressed

male underwear

female swimwear

female underwear

breasts out

strapless leotard

breast out

one breast out

gigantic breasts

huge butt

covered nipples

hair over breasts

no panties

sitting on face

nude

lingerie

nsfw

adult toys

nudity

graphic male nudity

illustrated explicit nudity

graphic female nudity

sexual intent

genitals

porn

futanari

hentai

peeing

blowjob

sexual activity

vore

anal

dildo riding

oral

incest

hanging

hate symbols

nazi party

white supremacy

self injury

extremist

hate speech

diapers

urine

scat

child on child

bukkake

fellatio

bikini

cumshot

implied fellatio

eat_cum

cumdrip

cum in pussy

cum on face

after fellatio

cum on hair

cum on body

cum on tongue

cum on hands

cum in mouth

Please see our <a target="_blank" rel="ugc" href="https://education.civitai.com/getting-started-with-stable-diffusion-3-5/">Quickstart Guide to Stable Diffusion 3.5</a> for all the latest info!<a target="_blank" rel="ugc" href="https://stability.ai/news/introducing-stable-diffusion-3-5">Stable Diffusion 3.5 Medium</a> is a Multimodal Diffusion Transformer with improvements (MMDiT-x) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.Please note: This model is released under the <a target="_blank" rel="ugc" href="https://stability.ai/community-license-agreement">Stability Community License</a>. Visit <a target="_blank" rel="ugc" href="https://stability.ai/license">Stability AI</a> to learn or <a target="_blank" rel="ugc" href="https://stability.ai/enterprise">contact us</a> for commercial licensing details.<h3 id="model-description-a4930wjuz">Model Description</h3><ul><li>Developed by: Stability AI</li><li>Model type: MMDiT-X text-to-image generative model</li><li>Model Description: This model generates images based on text prompts. It is a Multimodal Diffusion Transformer (<a target="_blank" rel="ugc" href="https://arxiv.org/abs/2403.03206">https://arxiv.org/abs/2403.03206</a>) with improvements that use three fixed, pretrained text encoders, with QK-normalization to improve training stability, and dual attention blocks in the first 12 transformer layers.</li></ul><h3 id="license-nwvob2qt5">License</h3><ul><li>Community License: Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the <a target="_blank" rel="ugc" href="https://stability.ai/community-license-agreement">Community License Agreement</a>. Read more at <a target="_blank" rel="ugc" href="https://stability.ai/license">https://stability.ai/license</a>.</li><li>For individuals and organizations with annual revenue above $1M: please <a target="_blank" rel="ugc" href="https://stability.ai/enterprise">contact us</a> to get an Enterprise License.</li></ul><h3 id="implementation-details-2f2shbmwa">Implementation Details</h3><ul><li>MMDiT-X: Introduces self-attention modules in the first 13 layers of the transformer, enhancing multi-resolution generation and overall image coherence.</li><li>QK Normalization: Implements the QK normalization technique to improve training Stability.</li><li>Mixed-Resolution Training:<ul><li>Progressive training stages: 256 → 512 → 768 → 1024 → 1440 resolution</li><li>The final stage included mixed-scale image training to boost multi-resolution generation performance</li><li>Extended positional embedding space to 384x384 (latent) at lower resolution stages</li><li>Employed random crop augmentation on positional embeddings to enhance transformer layer robustness across the entire range of mixed resolutions and aspect ratios. For example, given a 64x64 latent image, we add a randomly cropped 64x64 embedding from the 192x192 embedding space during training as the input to the x stream.</li></ul></li></ul>These enhancements collectively contribute to the model's improved performance in multi-resolution image generation, coherence, and adaptability across various text-to-image tasks.<ul><li>Text Encoders：<ul><li>CLIPs: <a target="_blank" rel="ugc" href="https://github.com/mlfoundations/open_clip">OpenCLIP-ViT/G</a>, <a target="_blank" rel="ugc" href="https://github.com/openai/CLIP/tree/main">CLIP-ViT/L</a>, context length 77 tokens</li><li>T5: <a target="_blank" rel="ugc" href="https://huggingface.co/google/t5-v1_1-xxl">T5-xxl</a>, context length 77/256 tokens at different stages of training</li></ul></li><li>Training Data and Strategy:This model was trained on a wide variety of data, including synthetic data and filtered publicly available data.</li></ul>For more technical details of the original MMDiT architecture, please refer to the <a target="_blank" rel="ugc" href="https://stability.ai/news/stable-diffusion-3-research-paper">Research paper</a>.<h3 id="usage-and-limitations-4i6uedbkm">Usage &amp; Limitations</h3><ul><li>While this model can handle long prompts, you may observe artifacts on the edge of generations when T5 tokens go over 256. Pay attention to the token limits when using this model in your workflow, and shortern prompts if artifacts becomes too obvious.</li><li>The medium model has a different training data distribution than the large model, so it may not respond to the same prompt similarly.</li><li>We recommended to sample with <a target="_blank" rel="ugc" href="https://github.com/comfyanonymous/ComfyUI/pull/5404">Skip Layer Guidance</a> for better struture and anatomy coherency.</li></ul>

Stable Diffusion 3.5 Medium

ComfyUI_00025_.png