Simple and Effective WAN2.2 I2V Workflow

TLDR

Finding a good, properly working WAN2.2 Image to Video workflow that isn't a tangled mess to follow and not dependent on a million QoL custom node packages has been all but impossible. After spending several weeks experimenting and increasing my understanding of WAN2.2, I've pared all the necessary elements of a WAN2.2 down to their most basic and have uploaded a workflow for others to use as their foundation. Make no mistake, however, this workflow should yield optimal WAN2.2 results.

The attached workflow only has 2 dependencies, both of which are necessary: a frame interpolator to get smooth video, and the RES4LYF package which has the beta57 scheduler - an absolute must for WAN2.2.

Basic Workflow Explanation

The workflow moves from left to right in groups. The Inputs group is where you set the starting frame, your positive and negative prompt, and the output video dimensions.

The Video Configuration group is where you set the video length, Shift, total number of steps, and the split of high and low noise steps. I suggest you leave them at their current values when you are first starting out (Shift of 8.0, length of 81, Total Steps 8, High-Low split 3). Further below, I will get into specifics of what these values mean and what you can expect to happen when you change them.

The Generate Video group is where you pick your loras, lightx2v/lightning lora strengths, and cfg values.

Finally, the Save Output group saves both a 30FPS video as well as the individual frames. I like to save the frames so that I can upscale them and make bigger videos. Check out my other articles for an upscaling workflow that is more than just blowing up the video.

Model Files and Where to Find Them

Below are the components you will need with links on where to download them. Most of these files are from HuggingFace. Just click the "download" link towards the left side of the page. Most of these have a low quality alternative. Use them if your generation time is too slow or if your system doesn't have enough RAM and VRAM to run the full quality versions. Quality will take a hit but better than not being able to generate anything.

WAN2.1 VAE (Yes, 2.1. There is no 2.2 VAE): https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/vae/wan_2.1_vae.safetensors

Save location: ComfyUI/models/vae

umt5_xxl_fp16 CLIP: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp16.safetensors

Low quality alternative: https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors

Save location: ComfyUI/models/text_encoders

WAN2.2 High Noise model: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/wan2.2_i2v_high_noise_14B_fp16.safetensors

Low quality alternative: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors

Save location: ComfyUI/models/diffusion_models

WAN2.2 Low Noise model: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/wan2.2_i2v_low_noise_14B_fp16.safetensors

Low quality alternative: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/blob/main/split_files/diffusion_models/wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors

Save location: ComfyUI/models/diffusion_models

High Noise LightX2V lora: https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/blob/main/wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step_1022.safetensors

Save location: ComfyUI/models/loras

Low Noise LIghtX2V lora: https://huggingface.co/lightx2v/Wan2.2-Distill-Loras/blob/main/wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step_1022.safetensors

Save location: ComfyUI/models/loras

Download and save the files above, crack open ComfyUI and open this workflow. Provide your starting frame (the one in this workflow is further below), configure your content loras as you see fit, and hit Run. Everything should work.

Why beta57?

This link (not my video) provides a detailed explanation of the importance of using the right scheduler and Shift value for WAN2.2. I suggest you watch it. If you don't care to watch it or don't understand it then worry not. Keep your scheduler at beta57 for the samplers and your Shift value at 8. Unless you really know what you're doing and are trying to use WAN2.2 in unintended ways to get a specific effect, you should never need to change these values.

The 3/5 Split

Most guides I've read or watched on WAN2.2 recommend a 50/50 split of steps between the high sampler and the low sampler. This is incorrect. The split is closer to 40/60 (explained in the video I linked above). This being a LightX2V workflow, 8 steps with a 3/5 split is more accurate than 6 steps with a 2/4 split or 4 steps with a 1/3 split.

LightX2V Weights

There's a lot of information online about setting the LightX2V loras to 3.0 on the high and 1.5 on the low or whatever. That's all horseshit. Here's what you need to know:

The high noise sampler is responsible for general and broad movement. If your subjects are running or walking and there's too little motion, or they seem to be moving in slow-mo, bump up the strength of the high noise LightX2V lora in small increments until you get the desired motion. I recommend not going higher than 1.4-1.5 or weird stuff starts to happen with lights and colors.

The low noise sampler is responsible for smaller details and their movement. Think eyes blinking, mouths smiling, clothing wrinkles moving, etc. If those details are hardly moving, consider bumping up the strength of the low LightX2V lora in small increments to a max of 1.4-1.5. Past that, things start getting really erratic.

The strengths you use will depend on the content loras you are using and the desired effect. If you want a fast, shaky cam scene, you'll likely need to bump the strength of both. But if you want a slo-mo scene of someone turning their head, you may even want to go below 1.0. I always start at 1.0, do a few test generations, and then work my way up or down in small increments until I find the sweet spot for what I'm trying to achieve.

Examples

In the videos below, I demonstrate the effects of the LightX2V lora weights. The prompt for this video is:

the camera tracks the woman as she walks confidently forward through the busy streets, revealing shops and buildings along the sidewalk as well as many people walking about. the woman is looking at the camera with a slight smirk on her face. her breasts bounce with every step she takes.

In the video above, I used a strength of 1.0 on both the high and low noise LightX2V loras. The video is fine but it is a bit slow and she doesn't really smirk.

On this second video, the high noise LightX2V was increased to 1.4 while keeping the low noise LightX2V at 1.0. You can see that the video is faster and she is able to take more steps within the 5 second duration but she only kinda, sorta smirks. That's because our small detail strength (low noise LightX2V lora) is still too weak.

In the video above, I brought the high noise LightX2V lora back down to 1.0 and increased the low noise LightX2V lora to 1.4. This provides enough localized noise strength the get her properly smirking.

Last but not least, I used a strength of 1.4 on both LightX2V loras. As you can see, there's a lot of localized movement and general movement such as camera shake and panning.

Get a feel for playing with these values to achieve the desire effect. If you want to experiment with the same starting frame, feel free to grab it from below.

20260125-170755-002-IllustriousWAI_x_REAL_-_v1-0-296589019.png

CFG

Last note: It is recommended that you keep CFG values on your samplers at 1.0 when using the LightX2V loras. If you've bumped up the strength of your LightX2V loras and the video is still not following the prompt as much as you'd like, try increasing the CFG on the high noise sampler a smidge (1.1-1.5). This can help with prompt adherence. I find that anything above 1.5, however, starts to really mess up parts of the video with crazy colors and color burning.