Easy Image to Video with AnimateDiff workflow

Static images images can be easily brought to life using ComfyUI and AnimateDiff.Static images can be easily brought to life using ComfyUI and AnimateDiff.

🎥👉Click here to watch the video tutorial

👉 Complete workflow with assets here

You will need the following custom nodes:

Auxiliary preprocessors
(cubiq) ipadapter plus
advanced-controlnet
animatediff evolved
Comfyui Frame Interpolation
video helper suite
rgthree’s comfyui nodes
crystools
Was Node suite

And have the following models installed:

REALESRGAN x2
VAE-FT- MSE-84000-EMA-PRUNED
Tile ControlNet
Sparse Control Scribble Control Net
IP Adapter plus SD 1.5
Clip Vision for IP Adapter (SD1.5)
AnimateDiff v3 model
AnimateDiff v3 motion model
AnimateLCM
AnimateLCM adapter (Lora)

These custom nodes and models can be obtained using the Manager in ComfyUI, except for AnimateLCM. These can be downloaded here:

AnimateLCM - v1.0 | Stable Diffusion Motion | Civitai

AnimateLCM Lora - v1.0 | Stable Diffusion LoRA | Civitai

Use any checkpoints (SD1.5) that you like. I personally use quite frequantly DreamShaper and Juggernaut, but others should work. Preferably, they should match the style of the starting image.

Start the workflow by connecting two Lora model loaders to the checkpoint. One should be AnimateLCM, and the other the Lora for AnimateDiff v3 (needed later for sparse scribble).

From there, construct the AnimateDiff setup using Evolved Sampling node. Use context options (preferably Looped Uniform), and use AnimateLCM t2v as a model. Keep the beta schedule in autoselect, or select lcm (or lcm >> sqrt).

Connect the built AnimateDiff group to an IP Adapter Tiled. Use the IP Adapter Unified Loader and select either VIT-G or PLUS.

Add then a Load Image node and use there the image you want to animate. You can connect the resulting model to the K Sampler.

Change your prompt and describe the scene:

Use two controlnets: one is Tile and the other is Sparse Scribble. Tile has a normal ControlNet model Loader, but for Sparse Scribble you need to add the Sparse Control Loader, using Sparse Scribble as a model.

Tile is connected directly with the reference image. For Sparse Scribble, use a Fake Scribble lines preprocessor.

For the latent, in the example we use 512x768, but you can use also landscape or a standard 512x512 latent. The total number of frames is 36 (which will be later 72 if we do interpolation)

At this moment, your workflow will be ready. Change the K Sampler settings to be used with AnimateLCM. To start, recommendation is to use 8 steps, CFG = 1.2, sampler LCM and scheduler sgm_uniform. Keep denoise at 1.0.

If you want to previsualize the Animation, use a Video Combine node or an Animation Node.

The degree of movement of the animation can be fine tuned by using scale multival in the AnimateDiff group. For example, increase it to 1.2.

The resulting animaton can be further refined with a second K Sampler (with a denoise of 0.4-0.7) and later resolution increased with Image Upscale and Frame interpolation.

Easy Image to Video with AnimateDiff workflow

🎥👉Click here to watch the video tutorial

👉 Complete workflow with assets here

Comments