Sign In

Comparison SAFETENSORS FP8+ vs GGUF (Q8-Q2) vs NVFP4

3

Quality comparison between multiple precision formats

Comparisons between the formats from a mixture of my models.

The comparison videos from a batch are made with same seeds and prompts.

Quantization types and quality estimation

There are many types of quantization. The most common "high-precision" formats are FP32 (full precision), BF16 (Brain Float), and FP16 (half-precision). For more aggressive compression, we use FP8 (simple or mixed) and 4-bit formats like NF4 (NormalFloat) or NVFP4 (NVIDIA's specialized 4-bit float).

GGUF it is a binary container format, but it is not "just" a box. While Safetensors usually stores raw weights (like FP16 or BF16), except mixed-format, GGUF is designed to store quantized weights using specific algorithms like K-Quants (e.g., Q4_K_M) or I-Quants (Importance Matrix). These are unique quantization methods optimized for llama.cpp that allow for "mixed-bit" storage, where different layers of the model are compressed at different intensities.

BF16 / FP16 (The Gold Standard)

  • Quality: ⭐⭐⭐⭐⭐
    No loss in motion consistency or fine texture.

  • Speed: Baseline. Requires high-end hardware (e.g., 48GB+ VRAM for the 14B model).

  • LoRA Compatibility: Native/Perfect. Most Wan 2.2 LoRAs are trained and tested on this precision.

FP8 (Scaled / Mixed)

  • Quality: ⭐⭐⭐⭐✬
    Nearly indistinguishable from BF16, though minor "flicker" can occur in complex textures.

  • Speed: Fastest on modern NVIDIA GPUs (40-series/H100) due to native FP8 hardware acceleration.

  • LoRA Compatibility: Excellent. Most modern workflows (ComfyUI/Diffusers) support applying LoRAs directly to FP8 weights with minimal shift.

GGUF Q8_0 / Q6_K

  • Quality: ⭐⭐⭐⭐
    Extremely high retention of the MoE (Mixture of Experts) logic in Wan 2.2.

  • Speed: Moderate. Slower than FP8 on high-end GPUs, but highly efficient for CPU/System RAM offloading.

  • LoRA Compatibility: Good. Requires specific loaders (like ComfyUI-GGUF) to patch LoRAs into the quantized weights.

GGUF Q5_K_M / Q5_0

  • Quality: ⭐⭐⭐⭐

  • Visuals: High retention of "micro-movements" (eye blinks, finger articulation) that often turn jittery in 4-bit. It holds the MoE expert routing stability nearly as well as Q6.

  • Speed: Balanced. Significant VRAM savings over FP8/Q8, allowing the 14B model to run comfortably on 8GB cards with room for resolution context.

  • LoRA Compatibility: Good. Unlike 4-bit, Q5 has enough "headroom" to maintain the likeness of specific faces or textures from a LoRA without the "smearing" effect.

NF4 (4-bit NormalFloat)

  • Quality: ⭐⭐⭐✬
    Good for general composition, but you may notice loss in "cinematic" fine details and text rendering.

  • Speed: Very Fast. Great for mid-range cards (e.g., RTX 3060/4060) to avoid OOM (Out of Memory) errors.

  • LoRA Compatibility: Fair. LoRA "smearing" can occur where the adapter's effect feels less precise or overly aggressive.

GGUF Q4_K_M / NVFP4

  • Quality: ⭐⭐⭐
    Significant compression. Motion may become slightly more "robotic" or jittery in Wan 2.2's 14B experts.

  • Speed: High Efficiency. NVFP4 is specifically tuned for speed on Blackwell/Ada architectures.

  • LoRA Compatibility: Moderate. Fine-tuned details from LoRAs (like specific faces) may lose likeness.

GGUF Q3_K_M / Q3_K_L

  • Quality: ⭐⭐✬
    Dynamic motion remains, but fine textures (hair, skin) become "mushy." Prompt adherence begins to slip.

  • Speed: Fast. Allows the 14B model to run on 8GB-10GB VRAM cards.

  • LoRA Compatibility: Weak. LoRAs may fail to "trigger" properly or cause significant color/artifacting issues.

GGUF Q2_K / IQ2_XS

  • Quality:
    Significant "hallucination" in video frames; Wan 2.2 may struggle to keep the MoE experts synchronized.

  • Speed: Very Fast (but low utility).

  • LoRA Compatibility: Poor. Most LoRAs will fail to produce recognizable results at this level of degradation.

Examples

~ (FP8+ vs NVFP4)
soon...

SynthSeduction v9

FP8+ vs GGUF vs Basic

TastySin v8

GGUF Q8-Q2

3