Step-by-Step Guide Series:
ComfyUI - VACE CONTROLNET
This article accompanies this workflow: link
Foreword :
This guide is intended to be as simple as possible, and certain terms will be simplified.
Workflow description :
This workflow allows you to retrieve the movements of a video via controlnet (pose/canny or depth) and create a new video from an image of your choice with this movement.
Prerequisites :
If you are on windows, you can use my script to download and install all prerequisites : link
ComfyUI,
Microsoft Visual Studio build tools :
winget install --id Microsoft.VisualStudio.2022.BuildTools -e --source winget --override "--quiet --wait --norestart --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows10SDK.20348"
📂Files :
Recommendation :
>24 gb Vram: base or Q8_0
16 gb Vram: Q5_K_S
<12 gb Vram: Q4_K_S
For base version
VACE Model: wan2.1_vace_14B_fp8_e4m3fn.safetensors or wan2.1_vace_1.3B_fp16.safetensors
In models/diffusion_models
For GGUF version
FUN Quant Model: Wan2.1-VACE-14B-QX_0.gguf
In models/diffusion_models
CLIP: umt5_xxl_fp8_e4m3fn_scaled.safetensors
in models/clip
VAE: wan_2.1_vae.safetensors
in models/vae
LORA: Wan21_CausVid_14B_T2V_lora_rank32.safetensors
in models/lora
ANY upscale model:
Realistic : RealESRGAN_x4plus.pth
Anime : RealESRGAN_x4plus_anime_6B.pth
in models/upscale_models
📦Custom Nodes :
Don't forget to close the workflow and open it again once the nodes have been installed.
Usage :
In this new version of the workflow everything is organized by color:
Green is what you want to create, also called prompt,
Red is what you don't want,
Yellow is all the parameters to adjust the video,
Pale-blue are feature activation nodes,
Blue are the model files used by the workflow,
Purple is for LoRA.
We will now see how to use each node:
Write what you want in the “Positive” node :
Write what you dont want in the "Negative" node :
Choose if you want automatic prompt addition :
If enabled, the workflow will analyze your image and automatically add a prompt to your.
Select image format :
The larger it is, longer the generation time and the greater the VRAM required.
Choose a number of steps :
With Causvid LoRa, i recommend between 10 and 15. The higher the number, the better the quality, but the longer it takes to generate video.
Choose the guidance level :
I recommend to star at 1. The lower the number, the freer you leave the model. The higher the number, the more the image will resemble what you “strictly” asked for.
Define a seed or let comfy generate one:
Choose if you want to force SageAttention usage :
Here you can activate SageAttention. This option is quite complex, you can read my dedicated guide here. If you don't know what it is, don't enable it. If you have used my installer for ComfyUI you can use this optimization.
Add how many LoRA you want to use, and define it :
If you dont know what is LoRA just active CausVid to faster the generation.
Choose if you want to remove background from your imported image :
Import your first frame image :
Don't forget that it will be reduced or enlarged to the format you've chosen. A video with too different a resolution can lead to poor results.
Import your "control" video :
Now you're ready to create your video.
Just click on the “Queue” button to start:
A preview of ControlNet will be displayed here :
Then the final video :
But there are still plenty of menus left? Yes indeed, here is the explanation of the additional options menu:
Here you can change models files.
Virtual VRAM option allow you to unload a part of the model in your RAM insted of VRAM.
Here you can choose the ControlNet Processor you want to use :
DWPose creates a skeleton and maintains its movement,
Depth preserves the shapes,
Canny keep the lines.
These nodes allow you to enable interpolation and choose its factor. To put it simply, this will generate intermediate frames and thus increase the fluidity of the video.
Here you can enable an upscaler and interpolation. This allows you to increase the resolution of your video and fluidity.
Saving the last frame of your video makes it easier to create a sequel by reusing that frame as the starting point for a new video.
This last node allows you to activate different optimizations:
CFGZeroStar: Improves overall quality of low CFG generations by blending outputs to reduce artifacts.
- Helps reduce flickering and hallucinations.
NOTE: May reduce contrast/detail if overusedTemporal Attention: Lets the model attend to multiple frames at once, rather than treating each frame independently.
- Improves consistency in moving objects (hair, limbs, etc.)
NOTE: Uses more VRAM; may slow down generation
Long video patch: This option uses RifleXRoPE to reduce bugs for videos longer than 5 seconds. The developer recommends up to 8 seconds with WAN and this module.
Torch Compile: Optimizes your model into a faster, more efficient version.
- Significantly Speeds up processing
- Reduces memory usage and improves performance
NOTE: 1st run will be slower due to compilation/optimization
Some additional information:
Organization of recordings:
All generated files are stored in comfyui/output/WAN/YYYY-MM-DD.
Depending on the options chosen you will find:
"hhmmss_OG_XXXXX" the new extended video,
"hhmmss_IN_XXXXX" the interpoled,
"hhmmss_UP_XXXXX" the upscaled,
"hhmmss_EX_XXXXX" the extension,
"hhmmss_LF_XXXXX" the last frame.