Type | Workflows |
Stats | 2,415 0 |
Reviews | (193) |
Published | Apr 22, 2025 |
Base Model | |
Hash | AutoV2 CA966C1EB1 |
Generate a base image with Wan 2.1 480p then upscale and smooth out with 1.3b t2v.
Using upscaling models directly on videos tends to produce poor results. Frames look too disjointed. Using just a few passes with low denoise of the 1.3b t2v model does a great job at taking upscaled videos and smoothing them back into looking more natural. The 14b t2v model produces even better results, but requires large amounts of vram and time. The 1.3b t2v model does a surprisingly good job and is pretty quick.
This workflow uses primarily GGUF quantized models to reduce vram where possible. The current version runs comfortably on 12GB of vram when using the Q3 i2v model and the Q4 T5 text encoder.
(If you are using this on less than 12GB please let me know!)
Models Needed
Goes into models/unet
(Use the Q6_K if you have 24GB of vram, otherwise Q4_K_M or Q3_K_M)
Goes into models/diffusion_models
Goes into models/text_encoders
Goes into models/vae
Goes into models/clip_vision
Any upscaler model. I recommend RealEsrgan_2xPlus
Goes into models/upscale_models
Settings
Experiment with denoise in the vid2vid section. 0.1 seems like a decent baseline. Higher should result in slightly smoother videos, but lose more detail from the original. Lower should result in more consistent details.
Frame Length can go lower to run on cards with less vram, or to create a video faster. If going over 81 frames, enable the RfileXRope node.
The workflow is a little dense, but it makes it easy to tweak settings quickly.