Kiko9 WAN 2.1 Native (ComfyUI)

Name: Kiko9 WAN 2.1 Native (ComfyUI)
Rating: 0 (0 reviews)
Author: kiko9

300

Updated: Mar 25, 2025

tool

Download (816.91 KB)

Verified: a month ago

Other

Details

Type	Workflows
Stats	300 0
Reviews	Positive (25)
Published	Mar 25, 2025
Base Model	Wan Video
Hash	AutoV2 74717423D5

1 File

About this version

default creator card background decoration

kiko9

🧠 Kiko9 ComfyUI WAN 2.1 Native Workflow

ComfyUI image-to-video (I2V) pipeline built around WAN 2.1 using native ComfyUI and Torch compilation (torch.compile) for performance gains. The design includes 2-pass generation, frame interpolation, upscaling, and slow motion — tailored for high-fidelity AI-enhanced video generation.

Link to workflow I use for start image:

📦 Workflow Overview

🛠️ Project Breakdown

🔧 Project Settings

Project File Path Generator: Allows saving outputs with a defined base path. Set this to your local output folder.
- ✅ User Action: Update root_path to your preferred save location.

🧮 Aspect Ratio Logic (Don't Touch)

Calculates width and height from image size using a float-to-int conversion for maintaining aspect ratio.
- ⚠️ Do not modify unless you understand aspect ratio propagation.

📸 Image Generation for Video (Optimized Resolution)

When creating video frames using image generation tools like FLUX / SDXL, it's important to generate at the right resolution to maintain sharpness and consistency.

🎯 Target Video Resolution

Target Size: 480x832
Aspect Ratio: 480 ÷ 832 ≈ 0.577

✅ Ideal Generation Resolution

To preserve details and allow for high-quality downscaling, generate at 2x or higher resolution. A perfect match in aspect ratio ensures you avoid cropping or distortion.

Gen ResolutionAspect RatioNotes960x1664960 ÷ 1664 ≈ 0.577✅ Perfect aspect ratio match1024x15361024 ÷ 1536 ≈ 0.6667🔶 Slight crop or padding needed

🔄 Workflow

Generate High-Res Images Use 960x1664 or larger with the same aspect ratio. Using FLUX, SDXL, etc.

🧮 Why This Works

High-res generation reduces artifacts and increases fidelity.
Downscaling averages pixels, smoothing jagged edges and noise.
Maintaining the same aspect ratio avoids warping or unnecessary padding.

📥 Loaders

Load Checkpoint (WAN2.1): Load the WAN 2.1 native (ComfyUI) model checkpoint.
VAE & CLIP Loader: Loads required VAE and CLIP encoders.
Power LoRA Loader (optional): For Power LoRa.
Tile Cache, Enhance, and CLIP Vision: Load auxiliary models.
- ✅ User Action:
  - Set ckpt_name, vae_name, and clip_name according to local model files.
  - Ensure files are in your configured ComfyUI model folders.

🖼️ Image / Resize

Load Image / Resize: Loads the input image or first frame from a video clip, resizes it to model-appropriate dimensions.

🌍 Global Settings

CLIP Text Encode (Prompt & Negative): Prompts for conditioning the model.
- ✅ User Action: Customize these prompts per your subject/style.
Seed Generator / Upscale Factor: Controls random seed and image scale-up.
- ✅ User Action: Set seed for reproducibility or leave -1 for random.

🔁 1st Pass (Initial Generation)

KSampler: Runs the initial inference.
VAE Decode & Video Combine: Decodes latent space to image, combines with source.
Slow Motion / PlaySound: Optional audio sync and slow-mo settings.
Select last frame for 2nd pass start frame. (Pop Up window)

🔁 2nd Pass (Refine & Extend)

Similar to 1st Pass but optimized for longer inference or higher quality.
Take last frame from 1st pass as 2nd pass starting image.
Get Mask Range From Clip: Extracts mask regions for attention.
Image Batch Multi: Processes multiple frames simultaneously.

📈 Upscaling & Frame Interpolation

Image Sharpen / Restore Faces: Post-processing enhancements.
Upscale Image (Real-ESRGAN or similar).
Frame Interpolation (RIFE / FILM): Smooth transitions for higher FPS.
Slow Motion: Optional, adds frames and blends for cinematic slow-mo.

🧪 Experimental (Optional, Long Runtime)

Advanced enhancement or second-stage denoising/refinement.
Useful for batch rendering with very high quality needs.
- ⏱️ Warning: These steps significantly increase processing time.

⚡ Torch Compile Setup (VERY IMPORTANT)

To unlock native acceleration via torch.compile, ensure you meet these requirements:

✅ Requirements

PyTorch 2.1+ with CUDA
NVIDIA GPU with Ampere or later architecture (RTX 30XX, 40XX)
Use latest nightly ComfyUI or manually apply torch.compile() patching.

💾 Saving Outputs

Controlled via Project Path Generator and Video Combine nodes.
Output format (e.g. .mp4, .png, .webm) should be explicitly set in Video Combine.

📋 Notes

⚠️ First run of torch.compile will be slow due to graph tracing.
🧠 Prompt tuning is crucial for WAN 2.1 — try detailed descriptions.
⚠️ Not optimized for older machines.

🙋 FAQ

Q: My output is laggy or missing frames.

Check interpolation settings and slow motion settings — disable one if not needed.

Q: Workflow crashes during torch compile.

Ensure you're using PyTorch 2.1+, and your GPU is Ampere or newer.

Q: Can I use this with other models like SDXL?

You can, but WAN 2.1 is optimized for this specific setup. Results may vary.

📎 Credits

Workflow design by Kiko9
WAN 2.1
ComfyUI team for the powerful modular engine

📂 Folder Structure Example

ComfyUI/
├── models/
│ ├── checkpoints/
│ ├── vae/
│ ├── clip/
├── output/
│ └── generated/
├── custom_nodes/ │

📊 End-to-End WAN 2.1 Generation Summary

StepDescriptionTime / Count. Resolution

Prompt StartInitial prompt execution begins 92.95 sec

Model LoadLoaded WAN21 model weights ~15,952 ms

First Comfy-VFI PassGenerated frames with TeaCache initialized ~6 min 13sec 480x832

Frames Generated (1st pass)Comfy-VFI output 231 frames 480x832

Second Comfy-VFI PassRepeats generation with same steps ~6 min 28 sec 480x832

Frames Generated (2nd pass)Comfy-VFI output(Implied 480x832

WanVAE Load (1st)Loaded latent space model ~1220 ms —

WanVAE Load (2nd)Loaded again for reuse ~1304 ms —

Face Restoration (GFPGAN)GFPGANv1.4 restored images 152 frames 512x512

Comfy-VFI Run (3rd)Generated additional frames ~unknown 960x1664 Frames Generated

(3rd pass)Comfy-VFI output 456 frames 960x1664

Comfy-VFI Run (4th)Final batch of generation~unknown 960x1664 Frames Generated

(4th pass)Comfy-VFI output304 frames960x1664Prompt EndFinal step of pipeline 1050.60 sec—

ℹ️ Notes:

"TeaCache skipped" 12 conditional + 12 unconditional steps per 30 = ~20% optimization.
Face restoration step was applied to a subset (152 frames).
The 960x1664 resolution used in the last two passes matches the 480x832 aspect ratio perfectly, ideal for downscaling or 2x video output.

🗨️ Feedback & Contributions

Feel free to submit issues if you encounter bugs or want to contribute improvements.

🔥 Happy rendering!