🎬 Professional Video-to-Video Transformation with WAN VACE
Transform your videos with professional quality using this comprehensive ComfyUI workflow for WAN VACE. This complete pipeline enables seamless video-to-video transformation of long-form videos with advanced features including seamless joining, upscaling, and frame interpolation. Break down lengthy videos into manageable segments, process them individually, and seamlessly combine them back into cohesive, high-quality output.
✨ Key Features
Long Video Processing: Handle extended video content by breaking into segments and seamlessly rejoining
Complete V2V Pipeline: Full video-to-video transformation workflow
Seamless Video Joining: Custom nodes for professional video concatenation without visible transitions
Multi-Step Process: Generate → Join → Combine → Upscale → Interpolate
Professional Quality: High-quality output with customizable settings
Memory Optimization: Low VRAM options for various GPU configurations
Batch Processing: Process multiple video segments efficiently
Scalable Architecture: Handle videos of any length through intelligent segmentation
📋 Requirements
Essential Model Files
🔴 WAN GGUF Models
Download from: QuantStack/Wan2.1_T2V_14B_FusionX_VACE-GGUF
Choose your preferred quantization (Q3_K_S, Q8_0, etc.)
Place in: ComfyUI/models/unet
🟣 WAN VAE
Download wan_2.1_vae.safetensors from: Comfy-Org/Wan_2.1_ComfyUI_repackaged
Place in: ComfyUI/models/vae
🟣 WAN Text Encoder
Download GGUF text encoders from: city96/umt5-xxl-encoder-gguf
Place in: ComfyUI/models/text_encoders
Required Custom Nodes
⚠️ Important: Download these custom nodes from this page (not available in ComfyUI-Manager):
ComfyUI Extensions
⚙️Install these custom notes using the ComfyUI-Manager.
ComfyUI-GGUF
ComfyUI-VideoHelperSuite
ComfyUI-KJNodes
ComfyUI-ControlNet-Aux
ComfyUI-Frame-Interpolation
ComfyUI-Easy-Use
📖 Step-by-Step Guide
Initial Setup
Configure Constants:
Width/Height: 576x1024 (9:16 aspect ratio) or match your source video
Length: 81 frames per segment
Skip Frames: Start with 0
Filename Prefix: Set your output folder and prefix
Load Source Materials:
Load your source video for restyling
Load reference image (ensure similar pose to first video frame)
Use SDXL/FLUX with LoRA and ControlNet for best pose matching
Step 1: Generate WAN Videos
Write Prompts:
Describe subject, outfit, and background
Include action phrases for dynamic results
Generate Video Segments:
Click run to generate first 81-frame video segment
Increase skip frames by 81 to process next segment
Repeat for the entire length of your source video
Final segment can be shorter but may have lower quality
For long videos: Continue this process until you've covered the full duration
Step 2: Join Videos Seamlessly
Configure Joining:
Set folder path to your generated videos
Set filename prefix matching your generated files
Start with filename suffix = 1
Use same prompt from Step 2
Join Process:
Run to join first and second videos
Increase filename suffix by 1
Run to join second and third videos
Repeat until all segments are joined
Step 3: Combine, Upscale, and Interpolate
Final Processing Setup:
Set folder path to joined videos
Keep filename suffix = 1 (constant)
Set combine filename for final output
Set upscale filename for enhanced version
Execute Final Pipeline:
Combine all joined videos
Upscale using RealESRGAN (2x scale)
Interpolate frames using FILM VFI (2x frame rate)
⚙️ Advanced Settings
Low VRAM Configuration
Use the UnetLoaderGGUFDisTorchMultiGPU node for memory optimization
Set virtual_vram_gb to 2.0-4.0 for 12GB and lower GPUs
Enable use_other_vram for additional memory fallback
Performance Optimization
Bypass PathchSageAttentionKJ and ModelPatchTorchSettings if you don't have Triton
Adjust batch sizes based on your GPU memory
Use appropriate quantization levels for your hardware
🎯 Tips for Best Results
Long Video Strategy: Plan your segmentation approach - 81 frames per segment ensures smooth transitions while maintaining manageable processing chunks
Reference Image Quality: Use high-quality reference images with poses similar to your source video's first frame
Prompt Engineering: Be specific about subject details, clothing, and background elements
Segment Planning: Plan your video segments to maintain narrative continuity across the entire video length
Hardware Considerations: Adjust settings based on your GPU capabilities - longer videos benefit from optimized VRAM settings
Consistency Maintenance: Keep prompts consistent across all segments to ensure visual coherence in the final long video
🩺 Troubleshooting
OOM Errors: Increase virtual_vram_gb or reduce batch sizes
Missing Nodes: Ensure all custom nodes are properly installed
Quality Issues: Check reference image alignment and prompt specificity
Processing Slow: Consider using lower quantization models for faster generation
🔧 Custom Nodes Parameter Guide
WanVideoVaceSeamlessJoin Node
This custom node seamlessly joins two video clips with intelligent masking for smooth transitions.
Parameters:
mask_last_frames
(INT): Number of frames to mask at the end of the first videoDefault: 0
Range: 0-20
Use 0 for no masking, 5-10 for subtle blending
mask_first_frames
(INT): Number of frames to mask at the beginning of the second videoDefault: 10
Range: 0-20
Recommended: 10 frames for smooth transitions
frame_load_cap
(INT): Maximum number of frames to load from each videoDefault: 81
Range: 1-1000
Should match your segment length (typically 81)
first_video_path
(STRING): Full path to the first video fileFormat:
"C:\path\to\video1.mp4"
Use absolute paths for reliability
second_video_path
(STRING): Full path to the second video fileFormat:
"C:\path\to\video2.mp4"
Ensure file exists and is accessible
Outputs:
image
: Combined video frames as image sequencemask
: Generated mask for the transition area
CombineVideoClips Node
This node combines multiple video clips into a single continuous sequence with advanced masking options.
Parameters:
frame_load_cap
(INT): Maximum frames to load per videoDefault: 81
Range: 1-1000
Should match your segment frame count
mask_last_frames
(INT): Frames to mask at the end of each video (except last)Default: 0
Range: 0-20
Use 0 for clean cuts, 5-10 for fade effects
mask_first_frames
(INT): Frames to mask at the beginning of each video (except first)Default: 10
Range: 0-20
Recommended: 10 for smooth transitions
first_video_path
(STRING): Path to the first video in sequenceBase video - typically your original generated video
first_joined_video_path
(STRING): Path to first seamlessly joined videoResult from first WanVideoVaceSeamlessJoin operation
second_joined_video_path
(STRING): Path to second seamlessly joined videoResult from second WanVideoVaceSeamlessJoin operation
third_joined_video_path
(STRING): Path to third seamlessly joined videoContinue pattern for additional segments
fourth_joined_video_path
(STRING): Path to fourth seamlessly joined videoOptional - use if you have this many segments
fifth_joined_video_path
(STRING): Path to fifth seamlessly joined videoOptional - maximum supported segments
last_video_path
(STRING): Path to the final video in sequenceThe last generated video segment
Output:
image
: Combined video sequence as image frames ready for final processing
Parameter Optimization Tips:
For Seamless Joining:
Short transitions:
mask_first_frames = 5
,mask_last_frames = 0
Smooth blending:
mask_first_frames = 10
,mask_last_frames = 5
Long crossfades:
mask_first_frames = 15
,mask_last_frames = 10
For File Paths:
Ensure all video files exist before running
Use consistent naming conventions for easier batch processing
Frame Count Considerations:
Set
frame_load_cap
to match your segment length (usually 81)Smaller values may truncate longer segments
This workflow provides professional-grade video transformation capabilities with comprehensive control over the entire pipeline from generation to final output.