Sign In

WAN 2.2 IMAGE to VIDEO with Caption and Postprocessing

35

1.1k

18

Type

Workflows

Stats

855

0

Reviews

Published

Jul 30, 2025

Base Model

Wan Video 14B i2v 480p

Hash

AutoV2
654BC64DC2
Howling Aurora
tremolo28's Avatar

tremolo28

Workflow: Image -> Autocaption (Prompt) -> WAN I2V with Upscale and Frame Interpolation and Video Extension

  • Creates Video Clips with 480p or 720p resolution.

There is a Florence Caption Version and a LTX Prompt Enhancer (LTXPE) version. LTXPE is more heavy on VRAM


V1.0 WAN 2.2. 14B Image to Video workflow with LightX2v Lora support for low steps (4-8 steps)

  • Wan 2.2. uses 2 models to process a clip. A High Noise and a Low Noise model, processed in sequence.

  • compatible with LightX2v Lora from Wan2.1 to process clips fast with low steps.

  • compatible to some of the Wan2.1 Loras, required to inject twice due to 2 model setup.

  • See notes in workflow.

  • GGUF models

  • 5sec clip with 6 Steps @ 480p take about 4mins, including autoprompt, 2x upscaling to 960p & frame interpolation to 30fps. (RTX4080-16gb Vram and 64gb Ram, sage attention)

Models can be donwloaded here:

Models (Low & High Noise required, pick the ones matching your Vram): https://huggingface.co/bullerwins/Wan2.2-I2V-A14B-GGUF/tree/main

LightX2v Lora (same as Wan 2.1): https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/tree/main/loras

Vae (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae

Textencoder (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders


Wan 2.2 14B Image to Video MultiClip workflow, allows to create clips and extend up to 20 seconds, see example videos.

Yet experimental, supporting LightX2v Lora, but no other Loras (nodes are bypassed, just placeholder yet).


WAN 2.2. I2V 5B Model (GGUF) workflow with Florence or LTXPE auto caption

  • lower quality than 14B model and currently slower (there is no LightX lora)

  • 720p @ 24 frames

Model (GGUF, load the model matching your Vram): https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/tree/main

VAE: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/vae

Textencoder (same as Wan 2.1) :https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders


location to save those files within your Comfyui folder:

Wan GGUF Model -> models/unet

Textencoder -> models/clip

Vae -> models/vae


Tips:

  • Default strength of LightX2v Wan 2.1 Lora with 0.8 is setup for a more realistic look, hair and skin look more real. For anime or comic like look you can increase strength to 1.0 or beyond (black nodes in wokflow)