physical violence

revealing clothes

weapon violence

pg-13

corpses

wide hips

convenient censoring

oral invitation

thick thighs

huge breasts

downblouse

suggestive

sexy

pg13

sexual situations

disturbing

male nudity

female swimwear or underwear

male swimwear or underwear

partial nudity

graphic violence or gore

emaciated bodies

exposed female nipple

female nudity

undressed

male underwear

female swimwear

female underwear

breasts out

strapless leotard

breast out

one breast out

gigantic breasts

huge butt

covered nipples

hair over breasts

no panties

sitting on face

nude

lingerie

nsfw

adult toys

nudity

graphic male nudity

illustrated explicit nudity

graphic female nudity

sexual intent

genitals

porn

futanari

hentai

peeing

blowjob

sexual activity

vore

anal

dildo riding

oral

incest

hanging

hate symbols

nazi party

white supremacy

self injury

extremist

hate speech

diapers

urine

scat

child on child

bukkake

fellatio

bikini

cumshot

implied fellatio

eat_cum

cumdrip

cum in pussy

cum on face

after fellatio

cum on hair

cum on body

cum on tongue

cum on hands

cum in mouth

(((NOT MY MODEL))) Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it. (SVD) Image-to-Video is a latent diffusion model trained to generate short video clips from an image conditioning. This model was trained to generate 25 frames at resolution 576x1024 given a context frame of the same size, finetuned from SVD Image-to-Video [14 frames]. We also finetune the widely used f8-decoder for temporal consistency. For convenience, real repo <a target="_blank" rel="ugc" href="https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/tree/main">stabilityai/stable-video-diffusion-img2vid-xt at main (</a><a target="_blank" rel="ugc" href="http://huggingface.co">huggingface.co</a><a target="_blank" rel="ugc" href="https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/tree/main">)</a> a latent video diffusion model for high-resolution, state-of-the-art text-tovideo and image-to-video synthesis. To construct its pretraining dataset, we conduct a systematic data selection and scaling study, and propose a method to curate vast amounts of video data and turn large and noisy video collection into suitable datasets for generative video models. Furthermore, we introduce three distinct stages of video model training which we separately analyze to assess their impact on the final model performance. Stable Video Diffusion provides a powerful video representation from which we finetune video models for state-of-the-art image-to-video synthesis and other highly relevant applications such as LoRAs for camera control. Finally we provide a pioneering study on multi-view finetuning of video diffusion models and show that SVD constitutes a strong 3D prior, which obtains stateof-the-art results in multi-view synthesis while using only a 8 fraction of the compute of previous methods.

stable video diffusion img2vid