WAN 2.1 is HERE, and is really POWERFUL

For those of us who are interested in video generation in local mode, we have gone through a long ordeal during the last year and a half. An eternal struggle for the control of the generated video.

T2V I2V V2V V2V, we have had unsatisfactory results for a long time, always struggling to find a balance point between quality and speed, with the limits imposed by the graphics. Let's not fool ourselves, not everyone can afford 24gb vram cards.

The arrival of Hunyuan has opened the doors to a reasonable level of control and the technical loras for the improvement of the movement promised an honorable future.

Skyreel has come to reinforce this project, giving improved movement capability to Hunyuan.

But, along came WAN 2.1 and turned the game board upside down again.

Here's what you need to know:

-It can be run on 16gb cards with no problems as far as 121 frames (or even more).

-Just like hunyuan, it is uncensored.

-Currently very interesting quantizations are being made that maintain good performance while losing very little quality.

-~~It is still not clear if it will be possible to perform LoRas with it~~, ~~but it is said that it can be compatible with those of HunYaun (I have not checked it yet, but I will do it).~~

-(CONFIRMED, it is possible to create loras in WAN and apparently they are based on parameters very similar to Hunyuan but hunyuan and wan loras are not interchangeable).

Do you want to try it?

This is my recommendation as of today for an acceptably optimal use in 16gb vram cards:

You will first need (for IMG2VID):

Text encoder and VAE:

umt5_xxl_fp8_e4m3fn_scaled.safetensors goes in: ComfyUI/models/text_encoders/

wan_2.1_vae.safetensors goes in: ComfyUI/models/vae/

Video Models

~~The diffusion models can be found~~ ~~here~~ (yes but you have a better option from THE KIJAI MASTER) -> https://huggingface.co/Kijai/WanVideo_comfy/tree/main

These files go in: ComfyUI/models/diffusion_models/

clip_vision_h.safetensors which goes in: ComfyUI/models/clip_vision/

And then:

-Make sure you have the latest version of comfyui.

-Download the attached workflow for Image to video ( sorry, I have not tested text to video, I prefer to have a reference image), and install the missing nodes from the manager.

-Tip: Use the 480p model first, (480x272px) an then upscale the final result to 1920x1080 with your favorite 4x upscaler.

-PROTIP: This system works better in Chinese than in English, you can use the Ollama nodes to translate your prompts, or at least use this in the negatives:

色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走