Sign In

Kiko9 WAN 2.1 Native (ComfyUI)

25
300
7
Updated: Mar 25, 2025
tool
Type
Workflows
Stats
300
0
Reviews
Published
Mar 25, 2025
Base Model
Wan Video
Hash
AutoV2
74717423D5
default creator card background decoration
Bronze Tier Support Badge November 2024
KI
kiko9

๐Ÿง  Kiko9 ComfyUI WAN 2.1 Native Workflow

ComfyUI image-to-video (I2V) pipeline built around WAN 2.1 using native ComfyUI and Torch compilation (torch.compile) for performance gains. The design includes 2-pass generation, frame interpolation, upscaling, and slow motion โ€” tailored for high-fidelity AI-enhanced video generation.

Link to workflow I use for start image:


๐Ÿ“ฆ Workflow Overview


๐Ÿ› ๏ธ Project Breakdown

๐Ÿ”ง Project Settings

  • Project File Path Generator: Allows saving outputs with a defined base path. Set this to your local output folder.

    • โœ… User Action: Update root_path to your preferred save location.


๐Ÿงฎ Aspect Ratio Logic (Don't Touch)

  • Calculates width and height from image size using a float-to-int conversion for maintaining aspect ratio.

    • โš ๏ธ Do not modify unless you understand aspect ratio propagation.


๐Ÿ“ธ Image Generation for Video (Optimized Resolution)

  • When creating video frames using image generation tools like FLUX / SDXL, it's important to generate at the right resolution to maintain sharpness and consistency.

๐ŸŽฏ Target Video Resolution

  • Target Size: 480x832

  • Aspect Ratio: 480 รท 832 โ‰ˆ 0.577

โœ… Ideal Generation Resolution

To preserve details and allow for high-quality downscaling, generate at 2x or higher resolution. A perfect match in aspect ratio ensures you avoid cropping or distortion.

Gen ResolutionAspect RatioNotes960x1664960 รท 1664 โ‰ˆ 0.577โœ… Perfect aspect ratio match1024x15361024 รท 1536 โ‰ˆ 0.6667๐Ÿ”ถ Slight crop or padding needed

๐Ÿ”„ Workflow

  1. Generate High-Res Images Use 960x1664 or larger with the same aspect ratio. Using FLUX, SDXL, etc.

๐Ÿงฎ Why This Works

  • High-res generation reduces artifacts and increases fidelity.

  • Downscaling averages pixels, smoothing jagged edges and noise.

  • Maintaining the same aspect ratio avoids warping or unnecessary padding.


๐Ÿ“ฅ Loaders

  • Load Checkpoint (WAN2.1): Load the WAN 2.1 native (ComfyUI) model checkpoint.

  • VAE & CLIP Loader: Loads required VAE and CLIP encoders.

  • Power LoRA Loader (optional): For Power LoRa.

  • Tile Cache, Enhance, and CLIP Vision: Load auxiliary models.

    • โœ… User Action:

      • Set ckpt_name, vae_name, and clip_name according to local model files.

      • Ensure files are in your configured ComfyUI model folders.


๐Ÿ–ผ๏ธ Image / Resize

  • Load Image / Resize: Loads the input image or first frame from a video clip, resizes it to model-appropriate dimensions.


๐ŸŒ Global Settings

  • CLIP Text Encode (Prompt & Negative): Prompts for conditioning the model.

    • โœ… User Action: Customize these prompts per your subject/style.

  • Seed Generator / Upscale Factor: Controls random seed and image scale-up.

    • โœ… User Action: Set seed for reproducibility or leave -1 for random.


๐Ÿ” 1st Pass (Initial Generation)

  • KSampler: Runs the initial inference.

  • VAE Decode & Video Combine: Decodes latent space to image, combines with source.

  • Slow Motion / PlaySound: Optional audio sync and slow-mo settings.

  • Select last frame for 2nd pass start frame. (Pop Up window)


๐Ÿ” 2nd Pass (Refine & Extend)

  • Similar to 1st Pass but optimized for longer inference or higher quality.

  • Take last frame from 1st pass as 2nd pass starting image.

  • Get Mask Range From Clip: Extracts mask regions for attention.

  • Image Batch Multi: Processes multiple frames simultaneously.


๐Ÿ“ˆ Upscaling & Frame Interpolation

  • Image Sharpen / Restore Faces: Post-processing enhancements.

  • Upscale Image (Real-ESRGAN or similar).

  • Frame Interpolation (RIFE / FILM): Smooth transitions for higher FPS.

  • Slow Motion: Optional, adds frames and blends for cinematic slow-mo.


๐Ÿงช Experimental (Optional, Long Runtime)

  • Advanced enhancement or second-stage denoising/refinement.

  • Useful for batch rendering with very high quality needs.

    • โฑ๏ธ Warning: These steps significantly increase processing time.


โšก Torch Compile Setup (VERY IMPORTANT)

To unlock native acceleration via torch.compile, ensure you meet these requirements:

โœ… Requirements

  • PyTorch 2.1+ with CUDA

  • NVIDIA GPU with Ampere or later architecture (RTX 30XX, 40XX)

  • Use latest nightly ComfyUI or manually apply torch.compile() patching.


๐Ÿ’พ Saving Outputs

  • Controlled via Project Path Generator and Video Combine nodes.

  • Output format (e.g. .mp4, .png, .webm) should be explicitly set in Video Combine.


๐Ÿ“‹ Notes

  • โš ๏ธ First run of torch.compile will be slow due to graph tracing.

  • ๐Ÿง  Prompt tuning is crucial for WAN 2.1 โ€” try detailed descriptions.

  • โš ๏ธ Not optimized for older machines.


๐Ÿ™‹ FAQ

Q: My output is laggy or missing frames.

  • Check interpolation settings and slow motion settings โ€” disable one if not needed.

Q: Workflow crashes during torch compile.

  • Ensure you're using PyTorch 2.1+, and your GPU is Ampere or newer.

Q: Can I use this with other models like SDXL?

  • You can, but WAN 2.1 is optimized for this specific setup. Results may vary.


๐Ÿ“Ž Credits

  • Workflow design by Kiko9

  • WAN 2.1

  • ComfyUI team for the powerful modular engine


๐Ÿ“‚ Folder Structure Example

ComfyUI/
โ”œโ”€โ”€ models/
โ”‚ โ”œโ”€โ”€ checkpoints/
โ”‚ โ”œโ”€โ”€ vae/
โ”‚ โ”œโ”€โ”€ clip/
โ”œโ”€โ”€ output/
โ”‚ โ””โ”€โ”€ generated/
โ”œโ”€โ”€ custom_nodes/ โ”‚


๐Ÿ“Š End-to-End WAN 2.1 Generation Summary

StepDescriptionTime / Count. Resolution

Prompt StartInitial prompt execution begins 92.95 sec

Model LoadLoaded WAN21 model weights ~15,952 ms

First Comfy-VFI PassGenerated frames with TeaCache initialized ~6 min 13sec 480x832

Frames Generated (1st pass)Comfy-VFI output 231 frames 480x832

Second Comfy-VFI PassRepeats generation with same steps ~6 min 28 sec 480x832

Frames Generated (2nd pass)Comfy-VFI output(Implied 480x832

WanVAE Load (1st)Loaded latent space model ~1220 ms โ€”

WanVAE Load (2nd)Loaded again for reuse ~1304 ms โ€”

Face Restoration (GFPGAN)GFPGANv1.4 restored images 152 frames 512x512

Comfy-VFI Run (3rd)Generated additional frames ~unknown 960x1664 Frames Generated

(3rd pass)Comfy-VFI output 456 frames 960x1664

Comfy-VFI Run (4th)Final batch of generation~unknown 960x1664 Frames Generated

(4th pass)Comfy-VFI output304 frames960x1664Prompt EndFinal step of pipeline 1050.60 secโ€”

โ„น๏ธ Notes:

  • "TeaCache skipped" 12 conditional + 12 unconditional steps per 30 = ~20% optimization.

  • Face restoration step was applied to a subset (152 frames).

  • The 960x1664 resolution used in the last two passes matches the 480x832 aspect ratio perfectly, ideal for downscaling or 2x video output.

๐Ÿ—จ๏ธ Feedback & Contributions

Feel free to submit issues if you encounter bugs or want to contribute improvements.


๐Ÿ”ฅ Happy rendering!