How to create Image2Video with EasyWanVideo

What is WAN 2.1?

Wan is an advanced and powerful visual generation model developed by Tongyi Lab of Alibaba Group. It can generate videos based on text, images, and other control signals. The Wan2.1 series models are now fully open-source.

https://github.com/Wan-Video/Wan2.1

What is EasyWanVideo?

EasyWanVideo is a tool that makes it easy to use the open-source video generation AI model Wan 2.1 on Windows.

Its is Comfy-UI variant from Zuntan. (https://civitai.com/user/Zuntan)

https://github.com/Zuntan03/EasyWanVideo

System Requirements

EasyWanVideo runs on Windows 10 or 11 PCs with NVIDIA GPUs.

The recommended specs are:

GPU: NVIDIA GeForce RTX 3060 12GB or higher
RAM: 32GB or more (48GB+ recommended for Kijai version)

Installation Steps

Download EasyWanVideo: Get the installer (EasyWanVideoInstaller.bat) from GitHub.
Run the Installer: Place the batch file in a suitable folder (e.g., C:\EasyWan\) and execute it.
Download SageAttention requirement: execute SetupSageAttention.bat .
Install SageAttention : excecutevs_buildtools.exe ,select desktop development with C++ , Install Visual Studio Build Tolls 2022 .
Install the NVIDIA CUDA Toolkit using cuda_12.8.1_windows_network.exe located in EasyWanVideo/SageAttention/.
[NOTE] If you have used SageAttention in a different environment, delete the Triton cache using EasyWanVideo/SageAttention/DeleteTritonCache.bat.

Launch Tool

You can launch ComfyUI by running ComfyUi.bat.

Open Workflow

Click the folder icon on the left and open the following JSON file from the browse list.

　00_Kijai_I2v　The main workflow for I2V (Image to Video) generation.

　55_ConcatInterpolate　Smoothly Concat the two videos.

　60_Upscale　 Upscale the video using an ESRGAN-based upscaler.

　70_Interpolate　Interpolates video frames and converts them to .mp4.

How to Use Main Workflow

I will use this picture for example.

This is the area where settings need to be changed in the workflow. Other than this, there is generally no need to modify anything.

Load Image
Select Your Image.
Setting
[IMPORTANT]
"Saving the final frame image" is necessary for creating videos long than 5 seconds video
"Generating prompts from an image" should be adjusted based on whether the output video matches your desired result or not.
Setting For Faster Gen, High Quality Gen
[NOTE]
I dont have RTX40x0 later, so I disable ""sageattention"
If you have RTX40x0 later, you should try it.
Load Wan2.1 Model Setting
if you will create 480 pixel height video
Wan2_1-I2V-14B-480P_fp8_e4m3fn.safetensors
if you will create 720 pixel height video
Wan2_1-I2V-14B-720P_fp8_e4m3fn.safetensors
[NOTE]
DO NOT USE "Wan2_1-T2V-1_3B_fp8_e4m3fn.safetensors" this model used for Text2Video.
Setting Height and Length
select 480 or 704
I think 704 is wrong so I fix it to 720.
Load Wan2.1 LoRA
Wan 2.1 LoRA makes it very easy to reproduce specific motions.
In this workflow, you can apply up to three LoRAs simultaneously. Toggle the activation to enable or disable them.
The strength of the LoRA can be adjusted from the nodes at the top.
[IMPORTANT]
To enable LoRA, you need to enter the trigger word in the prompt input field.
You can check the trigger word by following these steps:
Input Prompt
Enter the prompt required to generate the video.
Input the desired motion.
Example prompts can be found on the WAN 2.1 website. (https://wan.video/)
If automatic prompt generation is enabled, the prompt generated automatically will be added after the prompt you enter here.
Please make sure to enter the trigger word to enable LoRA in step 6.
TIPS
If the input image is smaller than the generated video, use lanczos.
If the input image is larger than the generated video, use area.
When enlarging:
- nearest-exact: Simply copies the nearest pixel from the original image. The original pixels remain as they are, resulting in a jagged image.
- bilinear: Selects 4 nearby pixels from the original image and averages them according to their distance. Specifically, linear interpolation is applied to the top 2 points, the bottom 2 points, and then linear interpolation is applied between those results. This creates a blurred image, and anisotropic artifacts may appear.
- bicubic: Similar to bilinear but uses cubic spline interpolation instead of linear interpolation. This also results in a blurred image.
- area: Same as nearest-exact.
When reducing:
- nearest-exact: Simply copies the nearest pixel from the original image. This can lead to moiré patterns or unnatural dots remaining in the image.
- bilinear: Despite reducing, it behaves similarly to enlargement, which can result in moiré patterns or unnatural dots.
- bicubic: Despite reducing, it behaves similarly to enlargement, which can result in moiré patterns or unnatural dots.
- area: Uses adaptive average pooling for reduction. It averages pixels from the original image to match the corresponding pixels in the reduced image, adjusting so that the overlap of the original image's pixels is less than or equal to 1. This helps avoid unnecessary blur and reduces issues like moiré patterns.
Reference (https://comfyui.creamlab.net/nodes/ImageScale)
RUN !!
The output results will be saved in \Output\yyyy-mm-dd

11.To create a continuation of the video, load the LastImage webp that was output at the same time as the normal webp with the Load Image from 1. and run the workflow again.

How to Use ConcatInterpolate Workflow

The output results will be saved in \Output\yyyy-dd\MMdd_HHmmss_ConcatInterpolate_00001_.webp

How to Use Upscale Workflow

In the default settings, the ESRGAN upscaler enlarges the image by 4 times, and then the Upscaler reduces the image size by 0.5 times, resulting in the image size being doubled.

The output results will be saved in \Output\yyyy-MM-dd\MMdd_HHmmss_Upscale_00001_.webp

How to Use Interpolate Workflow

The output results will be saved in \Output\yyyy-MM-dd\MMdd_HHmmss_Interpolate**_00001.mp4

3

The number of frames has been interpolated by 3 times, and the video speed is set to 125%.

3p

The number of frames has been interpolated by 3 times, and the video speed is set to 125%.　

After the video is played once, it will continue to play in reverse.

4

The number of frames has been interpolated by 4times, and the video speed is set to 94%.

4p

The number of frames has been interpolated by 4 times, and the video speed is set to 94%.　After the video is played once, it will continue to play in reverse.

Result

　https://civitai.com/posts/14855949