home models images videos posts articles comics challenges events updates shop

WAN 2.2 I2V GGUF — My 8GB Daily Workflow | Upscale + RIFE - v1.0 Showcase

Antarez

Loading Images

This is the ComfyUI workflow I use daily on my RTX 5060 8GB. I'm sharing it because getting WAN 2.2 to produce consistent quality at this VRAM level took quite a bit of trial and error, and I figure some of that research might save others time. These settings work well for me — your results may vary depending on your GPU, drivers and system setup, but this is what I keep coming back to.

It's designed to be plug and play. Load your image, write your prompt, queue. Everything else is already configured.

What it does

Two-pass generation using the WAN 2.2 Remix V2.1 Q3_K_M GGUF models, followed by a post-process pipeline that upscales and smooths the output. The final video comes out at around 5 seconds, 2x the native resolution, with RIFE frame interpolation for smoother motion.

Honest about timing: on my RTX 5060 8GB with 32GB RAM the full pipeline takes around 10 minutes per video. The quality I get out of it is pretty decent for this class of hardware — good enough that I kept using it. If you want faster results and don't need the upscale or RIFE, skipping those brings generation time down to around 400 seconds.

What you need

System: 8GB VRAM minimum, 32GB RAM recommended (the model offloads to RAM between passes).

Models:

wan22RemixT2VI2V_i2vHighV21-Q3_K_M.gguf
wan22RemixT2VI2V_i2vLowV21-Q3_K_M.gguf
umt5-xxl-encoder-Q5_K_M.gguf
wan_2.1_vae.safetensors
4x-UltraSharp.pth

Custom nodes: ComfyUI-GGUF, KJNodes, ComfyUI-NAG, VideoHelperSuite, ComfyUI-Frame-Interpolation, was-node-suite.

How to use

Import the JSON in ComfyUI via Load (API format)
In node 106 load your input image
In node 6 write your animation prompt
Queue and wait

Portrait orientation images work best. The workflow auto-crops your image to 480×848. Very dark images or images with busy backgrounds tend to produce less consistent results.

Why these settings

The core of this workflow is running euler on the HIGH pass and heun on the LOW pass. The reason heun matters specifically here is Q3_K_M quantization. heun is a predictor-corrector sampler — it evaluates the model twice per step and self-corrects, which cancels out the precision errors that Q3 compression introduces. Multi-step samplers like res_2m or dpmpp_2m carry those errors forward across steps and you end up with visible pixelation by step 4 or 5. I went through most of the obvious options before landing on this.

CFG is 1.5 on HIGH and 1.0 on LOW. The slightly higher CFG on HIGH locks in facial structure early in the denoising process. On LOW, anything above about 1.2 with Q3 models produces dark shadow artifacts around high-frequency detail — I ran into this hard at CFG 3.0 and it looked terrible. The split avoids this completely.

More steps go to LOW (4) than HIGH (2) because in WAN 2.2's MoE architecture, the LOW expert is responsible for facial identity and detail. More refinement passes there means better consistency across frames.

Post-process pipeline:

VAEDecode → FastUnsharpSharpen (0.8) → 4x-UltraSharp → ×0.5 scale → RIFE ×2 → H264 CRF17 @ 32fps

That's a net 2x upscale plus doubled frames. If you want even smoother output, change the RIFE multiplier in node 310 from 2 to 4 — no extra VRAM needed since RIFE runs on already-decoded frames.

NAG (Normalized Attention Guidance) is applied to both models at scale 11, alpha 0.25, tau 2.5. It works through attention normalization rather than CFG scaling so it doesn't trigger the shadow artifacts you'd get from pushing CFG higher. FBCache runs on both models at threshold 0.12 for a speed boost with minimal quality impact.

RTX 5000 series (Blackwell) — read this

SageAttention is patched on both models in auto mode. Don't change this to an explicit backend. On RTX 5060, 5070, 5080 and 5090 (sm_120 compute capability) the sageattn_qk_int8_pv_fp16_cuda backend crashes natively — those CUDA kernels haven't been compiled for Blackwell yet. auto detects the right implementation at runtime and works correctly.

If you're on an RTX 5000 series card, also use PyTorch cu130 (CUDA 13.0, compiled for sm_120) and launch ComfyUI with --disable-async-offload. The cu128 build underperforms on Blackwell and async offload has some instability on sm_120.

NVFP4 quantized WAN models are not available yet but when they are, Blackwell's FP4 tensor cores will handle them natively — expect a significant quality and speed uplift for this GPU family when that happens.

Variants

+ LightX2V — if you want faster generation at the cost of some quality, add the LightX2V LoRAs on top of this workflow:

HIGH: wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step_1022.safetensors at strength 1.5
LOW: wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step_1022.safetensors at strength 1.0

I find the base workflow without LoRAs gives better quality at the same step count, but LightX2V is useful when you want faster iteration.

+ SLG Face Lock — I've also written a custom node (SkipLayerGuidanceWAN) that adds zero-cost face lock on the LOW pass. With heun the difference is subtle since the corrector already stabilizes identity reasonably well, but it's there if you want it. I'll publish it as a separate post.

What didn't work

Noting these because they come up a lot and I spent time on all of them: res_2m and res_3s both pixelate from quantization error accumulation, uniform CFG 3.0 creates heavy shadow artifacts with Q3, MagCache throws a division by zero with heun and beta scheduler, karras scheduler is designed for noise-prediction architectures and doesn't fit WAN's flow matching, and the explicit SageAttention CUDA backend crashes on Blackwell.

Note on the negative prompt

Please don't remove the Chinese-language terms in the negative prompt. They're part of WAN's official negative conditioning and work at the model level, not just as text guidance. Removing them noticeably affects output quality.

v1.0 — initial release

If something doesn't work for you or you get better results with different settings, drop it in the comments. Always curious what others are finding on different hardware.