03/2025 CONTROL VIDEO - with Wan Fun - Controlnet for Video Generation

Alibaba has released a new WAN model, which support a new collection of controlnets and another for inpainting. This is being named "Control Video".

We will look at using it with ComfyUI, at the time of writing (28/3/2020) this is fresh out the oven today and so expect future updates with FP8 or GGUF quantization's as they become available.

INSTALLATION

1. Git clone the cogvideox_fun repo

cd ComfyUI/custom_nodes/

git clone https://github.com/aigc-apps/VideoX-Fun

git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite

cd CogVideoX-Fun/

python install.py

2. Download models into ComfyUI/models/Fun_Models/

Wan2.1-Fun-1.3B-InP (19GB) https://huggingface.co/alibaba-pai/Wan2.1-Fun-1.3B-InP

Wan2.1-Fun-1.3B text-to-video weights, trained at multiple resolutions, supporting start and end frame prediction.

Wan2.1-Fun-14B-InP (47GB) https://huggingface.co/alibaba-pai/Wan2.1-Fun-14B-InP
Wan2.1-Fun-14B text-to-video weights, trained at multiple resolutions, supporting start and end frame prediction.

Wan2.1-Fun-1.3B-Control (19GB) https://huggingface.co/alibaba-pai/Wan2.1-Fun-1.3B-Control
Wan2.1-Fun-1.3B video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, etc., and trajectory control. Supports multi-resolution (512, 768, 1024) video prediction at 81 frames, trained at 16 frames per second, with multilingual prediction support.

Wan2.1-Fun-14B-Control (47GB) https://huggingface.co/alibaba-pai/Wan2.1-Fun-14B-Control
Wan2.1-Fun-14B video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, etc., and trajectory control. Supports multi-resolution (512, 768, 1024) video prediction at 81 frames, trained at 16 frames per second, with multilingual prediction support.

3. (Optional) Download Lora models
place inside comfyUI/models/loras/fun_models/