Updated: Sep 3, 2025
base modelWorkflow Overview
This is a sophisticated ComfyUI workflow designed for high-quality, controllable video generation using the powerful Wan2.2 5B Fun model. It leverages ControlNet (via Canny edge detection) to transform a driving motion video and a starting reference image into a stunning, coherent animated sequence. Perfect for creating dynamic character animations with consistent style and precise motion transfer.
Core Concept: Use a "control video" (e.g., a person dancing) to guide the motion, and a "reference image" (e.g., a character design) to define the style and subject. The workflow intelligently merges them into a new, AI-generated video.
Key Features & Highlights
π State-of-the-Art Model: Utilizes the
Wan2.2-Fun-5B-Control-Q8_0.gguf
quantized model for a balance of incredible quality and manageable hardware requirements.π¨ Precision Control: Implements a Canny Edge ControlNet. The workflow extracts edges from your input video, ensuring the generated animation perfectly follows the original motion.
β‘ Optimized for Speed: Integrates a custom LoRA (
Wan2_2_5B_FastWanFullAttn
), allowing for high-quality results in just 8 sampling steps without significant quality loss.π§ Efficient LLM Inference: Uses a separate, quantized
umt5-xxl-encoder
CLIP model for text encoding, reducing VRAM load on your GPU.π§ Complete Pipeline: Everything from model loading, video preprocessing, conditioning, sampling, to final video encoding is included in one seamless, organized graph.
π Ready-to-Use: Pre-configured with optimal settings, including a detailed positive/negative prompt. Just load your own image and video to start creating.
Workflow Structure
The workflow is neatly grouped into logical sections for easy understanding and customization:
Step1 - Load models
: Loads the main Wan2.2 5B model, its VAE, the CLIP text encoder, and the FastWan LoRA.Step 2 - Start_image
: Loads your initial reference image. This defines the character and style for the first frame.Step 3 - Control video and video preprocessing
: Loads your motion video and processes it through the Canny node to extract edge maps.Step 4 - Prompt
: Where you input your positive and negative prompts to guide the generation.Step 5 - Video size & length
: TheWan22FunControlToVideo
node packages everything, setting the output video dimensions and length based on the control video.Sampling & Decoding: The KSampler runs for 8 steps with UniPC, and the VAE decodes the latents into final images.
Video Output: The
VHS_VideoCombine
node encodes the image sequence into an MP4 video file.
How to Use This Workflow
Download & Install:
Ensure you have ComfyUI Manager to easily install missing custom nodes.
Required Custom Nodes:
ComfyUI-VideoHelperSuite
,ComfyUI-GGUF
(for loading the .gguf models).Download the
.json
file from this post.
Load the Models:
Main Model: Place
Wan2.2-Fun-5B-Control-Q8_0.gguf
in yourComfyUI/models/gguf/
folder.CLIP Model: Place
umt5-xxl-encoder-q4_k_m.gguf
in the samegguf/
folder.VAE: The workflow points to
Wan2.2_VAE.safetensors
. Ensure it's in yourmodels/vae/
folder.LoRA: Place
Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors
in yourmodels/loras/
folder. Adjust the path in the LoraLoader node if yours is in a subfolder (e.g.,wan_loras/
).
Load Your Assets:
Reference Image: In the
LoadImage
node, change the image name to your own file (e.g.,my_character.png
).Control Video: In the
LoadVideo
node, change the video name to your own motion clip (e.g.,my_dance_video.mp4
).
Customize Your Prompt:
Edit the text in the Positive Prompt node to describe your desired character and scene.
The provided negative prompt is already comprehensive, but you can modify it as needed.
Run the Workflow:
Queue the prompt in ComfyUI. The final video will be saved to your
ComfyUI/output/video/
folder.
Tips for Best Results
Control Video: Use a video with clear, strong motion and good contrast for the Canny detector to work best. Silhouettes or videos with a plain background work excellently.
Reference Image: The first frame of your output will closely match this image. Use a high-quality image of your character in a pose similar to the first frame of your control video.
Length: The
length
inWan22FunControlToVideo
is set to121
based on the original video. If your video is a different length, you must update this value to match the number of frames.Experiment: Try adjusting the LoRA strength (e.g., between
0.4
-0.7
) or the Canny thresholds to fine-tune the balance between motion fidelity and creative freedom.
Required Models (Download Links)
Wan2.2-Fun-5B-Control-Q8_0.gguf: https://huggingface.co/QuantStack/Wan2.2-Fun-5B-Control-GGUF
umt5-xxl-encoder-q4_k_m.gguf: https://huggingface.co/city96/umt5-xxl-encoder-gguf/tree/main
Wan2.2_VAE.safetensors: https://huggingface.co/QuantStack/Wan2.2-Fun-5B-InP-GGUF/tree/main/vae
Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors: https://huggingface.co/Kijai/WanVideo_comfy/blob/main/FastWan/Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors
Conclusion
This workflow demonstrates the powerful synergy between the Wan2.2 model, ControlNet, and efficient LoRAs. It abstracts away the complexity, providing you with a robust, one-click solution for creating amazing AI-powered animations. Enjoy creating!
If you use this workflow, please share your results! I'd love to see what you create.