Unlock the potential for infinite video narratives with this powerful ComfyUI workflow. Designed specifically for the WAN2.2 5B text-to-video model, this setup automates the creation of long, coherent video sequences by implementing an intelligent feedback loop. It doesn't just string clips together; it creates a visually consistent and dynamically evolving story.
✨ Key Features & Highlights:
AI-Powered Prompt Chaining: The core of this workflow. An Ollama multi-modal LLM (like Qwen2.5-VL) analyzes the last frame of each generated video clip and automatically creates a new, detailed prompt for the next segment. This ensures each new clip logically continues from the previous one.
Perfect for Long-Form Content: Generate multi-part scenes, evolving transformations, or endless walking cycles without manual intervention. The loop is configurable to run for any number of iterations.
Superior Visual Consistency: Incorporates a color matching node (
easy imageColorMatch
) to harmonize the colors and tones between segments, preventing jarring visual jumps and creating a seamless flow.Built-In Quality Enhancement: Includes a RIFE VFI frame interpolation node that doubles the frame rate of the final assembled video, resulting in buttery-smooth motion.
Fully Automated Pipeline: From loading the initial image to rendering the final high-quality video, the process is hands-free after the initial setup.
🛠️ How It Works:
Preparation: The workflow starts with your initial image, which is scaled and analyzed.
Ollama Vision Analysis: The LLM examines the image and generates a dynamic, movement-focused prompt tailored for the WAN2.2 model.
Video Generation: The WAN2.2 5B model generates a short video clip (~5 seconds) based on this AI-crafted prompt.
Loop & Refine: The last frame is extracted, color-corrected, and fed back to Ollama to generate the next prompt. This loop repeats for your set number of iterations.
Final Assembly: All individual clips are combined into a single, smooth, long-form video file.
📦 What's Included:
.json
Workflow file for ComfyUI.A detailed breakdown of the node groups and their functions.
Recommended settings for optimal results.
⚙️ Recommended Models:
Text-Image-to-Video:
wan2.2_ti2v_5B_fp16.safetensors
LoRA:
Wan2_2_5B_FastWanFullAttn_lora_rank_128_bf16.safetensors
(for faster generation)VAE:
wan2.2_vae.safetensors
LLM (for Ollama): A vision-capable model like
qwen2.5-vl:7b
orllava-1.6
🎯 Ideal For:
Creating music videos with evolving visuals.
Generating long animations and story sequences.
Producing dynamic social media content loops.
Experimenting with AI-driven storytelling and scene progression.
Disclaimer: This workflow requires a properly configured ComfyUI environment with the necessary custom nodes (ComfyUI-Easy-Use, Video-Helper-Suite, ComfyUI-Ollama, ComfyUI-Frame-Interpolation) and an Ollama server running with a vision model.