Type | Workflows |
Stats | 300 0 |
Reviews | (25) |
Published | Mar 25, 2025 |
Base Model | |
Hash | AutoV2 74717423D5 |
๐ง Kiko9 ComfyUI WAN 2.1 Native Workflow
ComfyUI image-to-video (I2V) pipeline built around WAN 2.1 using native ComfyUI and Torch compilation (torch.compile
) for performance gains. The design includes 2-pass generation, frame interpolation, upscaling, and slow motion โ tailored for high-fidelity AI-enhanced video generation.
Link to workflow I use for start image:
๐ฆ Workflow Overview
๐ ๏ธ Project Breakdown
๐ง Project Settings
Project File Path Generator: Allows saving outputs with a defined base path. Set this to your local output folder.
โ User Action: Update
root_path
to your preferred save location.
๐งฎ Aspect Ratio Logic (Don't Touch)
Calculates
width
andheight
from image size using a float-to-int conversion for maintaining aspect ratio.โ ๏ธ Do not modify unless you understand aspect ratio propagation.
๐ธ Image Generation for Video (Optimized Resolution)
When creating video frames using image generation tools like FLUX / SDXL, it's important to generate at the right resolution to maintain sharpness and consistency.
๐ฏ Target Video Resolution
Target Size:
480x832
Aspect Ratio:
480 รท 832 โ 0.577
โ Ideal Generation Resolution
To preserve details and allow for high-quality downscaling, generate at 2x or higher resolution. A perfect match in aspect ratio ensures you avoid cropping or distortion.
Gen ResolutionAspect RatioNotes960x1664960 รท 1664 โ 0.577
โ
Perfect aspect ratio match1024x15361024 รท 1536 โ 0.6667
๐ถ Slight crop or padding needed
๐ Workflow
Generate High-Res Images Use
960x1664
or larger with the same aspect ratio. Using FLUX, SDXL, etc.
๐งฎ Why This Works
High-res generation reduces artifacts and increases fidelity.
Downscaling averages pixels, smoothing jagged edges and noise.
Maintaining the same aspect ratio avoids warping or unnecessary padding.
๐ฅ Loaders
Load Checkpoint (WAN2.1): Load the WAN 2.1 native (ComfyUI) model checkpoint.
VAE & CLIP Loader: Loads required VAE and CLIP encoders.
Power LoRA Loader (optional): For Power LoRa.
Tile Cache, Enhance, and CLIP Vision: Load auxiliary models.
โ User Action:
Set
ckpt_name
,vae_name
, andclip_name
according to local model files.Ensure files are in your configured ComfyUI model folders.
๐ผ๏ธ Image / Resize
Load Image / Resize: Loads the input image or first frame from a video clip, resizes it to model-appropriate dimensions.
๐ Global Settings
CLIP Text Encode (Prompt & Negative): Prompts for conditioning the model.
โ User Action: Customize these prompts per your subject/style.
Seed Generator / Upscale Factor: Controls random seed and image scale-up.
โ User Action: Set
seed
for reproducibility or leave -1 for random.
๐ 1st Pass (Initial Generation)
KSampler: Runs the initial inference.
VAE Decode & Video Combine: Decodes latent space to image, combines with source.
Slow Motion / PlaySound: Optional audio sync and slow-mo settings.
Select last frame for 2nd pass start frame. (Pop Up window)
๐ 2nd Pass (Refine & Extend)
Similar to 1st Pass but optimized for longer inference or higher quality.
Take last frame from 1st pass as 2nd pass starting image.
Get Mask Range From Clip: Extracts mask regions for attention.
Image Batch Multi: Processes multiple frames simultaneously.
๐ Upscaling & Frame Interpolation
Image Sharpen / Restore Faces: Post-processing enhancements.
Upscale Image (Real-ESRGAN or similar).
Frame Interpolation (RIFE / FILM): Smooth transitions for higher FPS.
Slow Motion: Optional, adds frames and blends for cinematic slow-mo.
๐งช Experimental (Optional, Long Runtime)
Advanced enhancement or second-stage denoising/refinement.
Useful for batch rendering with very high quality needs.
โฑ๏ธ Warning: These steps significantly increase processing time.
โก Torch Compile Setup (VERY IMPORTANT)
To unlock native acceleration via torch.compile
, ensure you meet these requirements:
โ Requirements
PyTorch 2.1+ with CUDA
NVIDIA GPU with Ampere or later architecture (RTX 30XX, 40XX)
Use latest nightly ComfyUI or manually apply
torch.compile()
patching.
๐พ Saving Outputs
Controlled via Project Path Generator and Video Combine nodes.
Output format (e.g.
.mp4
,.png
,.webm
) should be explicitly set inVideo Combine
.
๐ Notes
โ ๏ธ First run of torch.compile will be slow due to graph tracing.
๐ง Prompt tuning is crucial for WAN 2.1 โ try detailed descriptions.
โ ๏ธ Not optimized for older machines.
๐ FAQ
Q: My output is laggy or missing frames.
Check interpolation settings and slow motion settings โ disable one if not needed.
Q: Workflow crashes during torch compile.
Ensure you're using PyTorch 2.1+, and your GPU is Ampere or newer.
Q: Can I use this with other models like SDXL?
You can, but WAN 2.1 is optimized for this specific setup. Results may vary.
๐ Credits
Workflow design by Kiko9
WAN 2.1
ComfyUI team for the powerful modular engine
๐ Folder Structure Example
ComfyUI/
โโโ models/
โ โโโ checkpoints/
โ โโโ vae/
โ โโโ clip/
โโโ output/
โ โโโ generated/
โโโ custom_nodes/ โ
๐ End-to-End WAN 2.1 Generation Summary
StepDescriptionTime / Count. Resolution
Prompt StartInitial prompt execution begins 92.95 sec
Model LoadLoaded WAN21 model weights ~15,952 ms
First Comfy-VFI PassGenerated frames with TeaCache initialized ~6 min 13sec 480x832
Frames Generated (1st pass)Comfy-VFI output 231 frames 480x832
Second Comfy-VFI PassRepeats generation with same steps ~6 min 28 sec 480x832
Frames Generated (2nd pass)Comfy-VFI output(Implied 480x832
WanVAE Load (1st)Loaded latent space model ~1220 ms โ
WanVAE Load (2nd)Loaded again for reuse ~1304 ms โ
Face Restoration (GFPGAN)GFPGANv1.4 restored images 152 frames 512x512
Comfy-VFI Run (3rd)Generated additional frames ~unknown 960x1664 Frames Generated
(3rd pass)Comfy-VFI output 456 frames 960x1664
Comfy-VFI Run (4th)Final batch of generation~unknown 960x1664 Frames Generated
(4th pass)Comfy-VFI output304 frames960x1664Prompt EndFinal step of pipeline 1050.60 secโ
โน๏ธ Notes:
"TeaCache skipped" 12 conditional + 12 unconditional steps per 30 = ~20% optimization.
Face restoration step was applied to a subset (152 frames).
The 960x1664 resolution used in the last two passes matches the 480x832 aspect ratio perfectly, ideal for downscaling or 2x video output.
๐จ๏ธ Feedback & Contributions
Feel free to submit issues if you encounter bugs or want to contribute improvements.
๐ฅ Happy rendering!