Workflow: Image -> Autocaption (Prompt) -> WAN I2V with Upscale and Frame Interpolation and Video Extension
Creates Video Clips with 480p or 720p resoltion.
There is a Florence Caption Version and a LTX Prompt Enhancer (LTXPE) version. LTXPE is more heavy on VRAM
V1.0 WAN 2.2. 14B Image to Video workflow with LightX2v Lora support for low steps (4-8 steps)
Wan 2.2. uses 2 models to process a clip. A High Noise and a Low Noise model, processed in sequence.
compatible with LightX2v Lora from Wan2.1 to process clips fast with low steps.
compatible to some of the Wan2.1 Loras, required to inject twice due to 2 model setup.
See notes in workflow.
GGUF models
5sec clip with 6 Steps @ 480p take about 4mins, including autoprompt, 2x upscaling to 960p & frame interpolation to 30fps. (RTX4080-16gb Vram and 64gb Ram, sage attention)
Models can be donwloaded here:
Models (Low & High Noise required, pick the ones matching your Vram): https://huggingface.co/bullerwins/Wan2.2-I2V-A14B-GGUF/tree/main
LightX2v Lora (same as Wan 2.1): https://huggingface.co/lightx2v/Wan2.1-I2V-14B-480P-StepDistill-CfgDistill-Lightx2v/tree/main/loras
Vae (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae
Textencoder (same as Wan 2.1): https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders
Wan 2.2 14B Image to Video MultiClip workflow, allows to create clips and extend up to 20 seconds, see example videos.
Yet experimental, supporting LightX2v Lora, but no other Loras (nodes are bypassed, just placeholder yet).
WAN 2.2. I2V 5B Model (GGUF) workflow with Florence or LTXPE auto caption
lower quality than 14B model and currently slower (there is no LightX lora)
720p @ 24 frames
Model (GGUF, load the model matching your Vram): https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/tree/main
VAE: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/vae
Textencoder (same as Wan 2.1) :https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/text_encoders
location to save those files within your Comfyui folder:
Wan GGUF Model -> models/unet
Textencoder -> models/clip
Vae -> models/vae
Tips:
Default strength of LightX2v Wan 2.1 Lora with 0.8 is setup for a more realistic look, hair and skin look more real. For anime or comic like look you can increase strength to 1.0 or beyond (black nodes in wokflow)
Try a LightX2v Lora strength of 0.8-1.0 for the High Noise model and a strength of 1.5-2.0 for the Low Noise Lora to have more vivid motion (see black Lora nodes)