Quality comparison between multiple precision formats
Comparisons between the formats from a mixture of my models.
The comparison videos from a batch are made with same seeds and prompts.
Quantization types and quality estimation
There are many types of quantization. The most common "high-precision" formats are FP32 (full precision), BF16 (Brain Float), and FP16 (half-precision). For more aggressive compression, we use FP8 (simple or mixed) and 4-bit formats like NF4 (NormalFloat) or NVFP4 (NVIDIA's specialized 4-bit float).
GGUF it is a binary container format, but it is not "just" a box. While Safetensors usually stores raw weights (like FP16 or BF16), except mixed-format, GGUF is designed to store quantized weights using specific algorithms like K-Quants (e.g., Q4_K_M) or I-Quants (Importance Matrix). These are unique quantization methods optimized for llama.cpp that allow for "mixed-bit" storage, where different layers of the model are compressed at different intensities.
BF16 / FP16 (The Gold Standard)
Quality: ⭐⭐⭐⭐⭐
No loss in motion consistency or fine texture.Speed: Baseline. Requires high-end hardware (e.g., 48GB+ VRAM for the 14B model).
LoRA Compatibility: Native/Perfect. Most Wan 2.2 LoRAs are trained and tested on this precision.
FP8 (Scaled / Mixed)
Quality: ⭐⭐⭐⭐✬
Nearly indistinguishable from BF16, though minor "flicker" can occur in complex textures.Speed: Fastest on modern NVIDIA GPUs (40-series/H100) due to native FP8 hardware acceleration.
LoRA Compatibility: Excellent. Most modern workflows (ComfyUI/Diffusers) support applying LoRAs directly to FP8 weights with minimal shift.
GGUF Q8_0 / Q6_K
Quality: ⭐⭐⭐⭐
Extremely high retention of the MoE (Mixture of Experts) logic in Wan 2.2.Speed: Moderate. Slower than FP8 on high-end GPUs, but highly efficient for CPU/System RAM offloading.
LoRA Compatibility: Good. Requires specific loaders (like
ComfyUI-GGUF) to patch LoRAs into the quantized weights.
GGUF Q5_K_M / Q5_0
Quality: ⭐⭐⭐⭐
Visuals: High retention of "micro-movements" (eye blinks, finger articulation) that often turn jittery in 4-bit. It holds the MoE expert routing stability nearly as well as Q6.
Speed: Balanced. Significant VRAM savings over FP8/Q8, allowing the 14B model to run comfortably on 8GB cards with room for resolution context.
LoRA Compatibility: Good. Unlike 4-bit, Q5 has enough "headroom" to maintain the likeness of specific faces or textures from a LoRA without the "smearing" effect.
NF4 (4-bit NormalFloat)
Quality: ⭐⭐⭐✬
Good for general composition, but you may notice loss in "cinematic" fine details and text rendering.Speed: Very Fast. Great for mid-range cards (e.g., RTX 3060/4060) to avoid OOM (Out of Memory) errors.
LoRA Compatibility: Fair. LoRA "smearing" can occur where the adapter's effect feels less precise or overly aggressive.
GGUF Q4_K_M / NVFP4
Quality: ⭐⭐⭐
Significant compression. Motion may become slightly more "robotic" or jittery in Wan 2.2's 14B experts.Speed: High Efficiency. NVFP4 is specifically tuned for speed on Blackwell/Ada architectures.
LoRA Compatibility: Moderate. Fine-tuned details from LoRAs (like specific faces) may lose likeness.
GGUF Q3_K_M / Q3_K_L
Quality: ⭐⭐✬
Dynamic motion remains, but fine textures (hair, skin) become "mushy." Prompt adherence begins to slip.Speed: Fast. Allows the 14B model to run on 8GB-10GB VRAM cards.
LoRA Compatibility: Weak. LoRAs may fail to "trigger" properly or cause significant color/artifacting issues.
GGUF Q2_K / IQ2_XS
Quality: ⭐
Significant "hallucination" in video frames; Wan 2.2 may struggle to keep the MoE experts synchronized.Speed: Very Fast (but low utility).
LoRA Compatibility: Poor. Most LoRAs will fail to produce recognizable results at this level of degradation.

