Sign In

Scentific Breakdown of the VAE's Onsite (Million Buzz Earners)

1

Scentific Breakdown of the VAE's Onsite (Million Buzz Earners)

VAE Breakdown vs Base SDXL

Just a FYI two of the models listed here are my trainings. And by scientific metrics the VAE that is used the most is terrible. But the end goal of that training was to interact with SDXL in a very unique way.

Sometimes visual output is more important then what the machine sees. That is why folks like lizardon1024 that consistently test VAE's and publish at least one image work are important.

  • 150 Images where compared per model for this VAE evaluation. The scores are averaged across those real world images. The same images where used for eval across all models.

Everything from here until the bottom note is GPT reviewing the metrics attached at the bottom.

BASE SDXL

Scores

  • LPIPS: 0.041

  • Gradient Retention: 359 vs 404 GT

  • Contrast Gain: 0.814

  • Green Shift: -0.272

  • Color Count: 39.9k

Characteristics

The baseline SDXL VAE behaves exactly like a conventional latent-compression decoder:

  • moderate texture smoothing

  • reduced contrast

  • muted chroma

  • good semantic preservation

Its strongest trait is balance:

  • perceptually faithful

  • relatively stable

  • low hallucination

  • low sharpening artifacts

Its main weakness is:

  • visible contrast flattening

  • green suppression

  • slight blur

Overall Profile

Category Rating

Perceptual Fidelity Very Good

Sharpness Moderate

Contrast Weak

Color Accuracy Moderate

Stability Excellent

HDR Effect

Scores

  • LPIPS: 0.055

  • Gradient Retention: 237

  • Contrast Gain: 0.923

  • Green Shift: -0.374

  • Blue Shift: +0.076

Characteristics

This VAE strongly compresses texture detail while boosting apparent dynamic range.

The numbers suggest:

  • aggressive smoothing

  • chroma manipulation

  • HDR-style tonal remapping

Key signatures:

  • very low gradients

  • high contrast retention

  • strong blue bias

  • severe green suppression

This likely produces:

  • cinematic HDR appearance

  • cleaner gradients

  • softer microtexture

  • cooler color balance

Overall Profile

Category Rating

Perceptual Fidelity Good

Sharpness Poor

Contrast Strong

Color Accuracy Weak

Stylization Very High

Interpretation

This is not a “neutral reconstruction” VAE.

It behaves more like:

  • a stylizing decoder

  • an HDR remapping decoder

  • a perceptual enhancement VAE

rather than a faithful latent reproducer.


SDXL Natural Skintone

Scores

  • LPIPS: 0.319

  • Gradient Retention: 974

  • Contrast Gain: 0.933

  • Color Count: 48.7k

Characteristics

This VAE is extremely aggressive.

The metrics indicate:

  • heavy edge enhancement

  • texture synthesis

  • sharpening artifacts

  • latent hallucination

The critical signal:

grad_rec = 974 vs 404 GT

That is enormous oversharpening.

Combined with:

  • huge LPIPS increase

  • exploding color count

  • elevated contrast

this is behaving less like a reconstruction VAE and more like:

  • a detail enhancer

  • a texture generator

  • a sharpening decoder

Overall Profile

Category Rating

Perceptual Fidelity Poor

Sharpness Extreme

Contrast Strong

Color Accuracy Moderate

Texture Hallucination Very High

Interpretation

This decoder is likely optimized for:

  • perceived skin detail

  • pore enhancement

  • local contrast

rather than reconstruction accuracy.

It may look “crisp” to humans while being mathematically far from the source image.


Flat Piece XL

Scores

  • LPIPS: 0.034

  • Gradient Retention: 411

  • Contrast Gain: 1.000

  • Brightness Bias: 0.000

  • Green Shift: -0.097

Characteristics

This is the strongest overall performer in your benchmark.

The metrics are exceptionally balanced:

  • lowest LPIPS

  • near-perfect gradient retention

  • exact contrast preservation

  • zero brightness drift

  • low color bias

This is unusually clean behavior for a VAE.

The decoder appears tuned for:

  • faithful reconstruction

  • tonal neutrality

  • minimal latent distortion

Overall Profile

Category Rating

Perceptual Fidelity Excellent

Sharpness Excellent

Contrast Excellent

Color Accuracy Very Good

Stability Excellent

Interpretation

This is the closest thing in your tests to a “transparent” VAE.

Its outputs are likely:

  • natural

  • stable

  • neutral

  • highly faithful

without obvious sharpening or stylization artifacts.

This is your strongest technical VAE overall.


Sharp Spectrum V1

Scores

  • LPIPS: 0.040

  • Gradient Retention: 424

  • Contrast Gain: 1.000

  • Green Shift: -0.097

Characteristics

This VAE behaves like a sharpened evolution of Flat Piece XL.

Compared to baseline SDXL:

  • better detail retention

  • preserved contrast

  • reduced blur

Compared to Flat Piece:

  • slightly more aggressive sharpening

  • slightly worse perceptual fidelity

  • stronger texture enhancement

The metrics suggest:

  • mild edge enhancement

  • controlled sharpening

  • high tonal stability

without crossing into hallucination territory.

Overall Profile

Category Rating

Perceptual Fidelity Very Good

Sharpness Excellent

Contrast Excellent

Color Accuracy Very Good

Stylization Moderate

Interpretation

This may be the best “practical” VAE:

  • sharper than Flat Piece

  • much safer than Natural Skintone

  • more detailed than Base SDXL

while remaining mathematically stable.


Final Ranking

Most Faithful Reconstruction

  1. Flat Piece XL

  2. Sharp Spectrum V1

  3. Base SDXL

  4. HDR Effect

  5. Natural Skintone


Sharpest Output

  1. Natural Skintone

  2. Sharp Spectrum V1

  3. Flat Piece XL

  4. Base SDXL

  5. HDR Effect


Most Neutral Color Behavior

  1. Flat Piece XL

  2. Sharp Spectrum V1

  3. Base SDXL

  4. Natural Skintone

  5. HDR Effect


Most Stylized

  1. Natural Skintone

  2. HDR Effect

  3. Sharp Spectrum V1

  4. Base SDXL

  5. Flat Piece XL


Overall Conclusion

The metrics are internally coherent and scientifically believable, which strongly suggests your evaluation pipeline is functioning correctly.

Note: CHAT GPT Analyzed the following after reviewing my full workflow.

BASE SDXL

{

  "avg_lpips": 0.04136230785455277,

  "avg_grad_gt": 404.0227391257215,

  "avg_grad_rec": 359.51499563188696,

  "avg_colors_gt": 33150.611940298506,

  "avg_colors_rec": 39910.0447761194,

  "avg_brightness_bias": 0.0219295685155206,

  "avg_contrast_gain": 0.8136377334594727,

  "avg_r_shift": 0.024202517445769105,

  "avg_g_shift": -0.2722856144406902,

  "avg_b_shift": 0.0095405328719974

}

HDR Effect

{

  "avg_lpips": 0.05460646381573891,

  "avg_grad_gt": 404.0227391257215,

  "avg_grad_rec": 237.00000592132113,

  "avg_colors_gt": 33150.611940298506,

  "avg_colors_rec": 37260.86567164179,

  "avg_brightness_bias": 0.05311677081566026,

  "avg_contrast_gain": 0.9232270121574402,

  "avg_r_shift": 0.022212467781865773,

  "avg_g_shift": -0.37442198707096613,

  "avg_b_shift": 0.07591454711144985

}

SDXL Natural Skintone

{

  "avg_lpips": 0.31924405504963294,

  "avg_grad_gt": 404.0227391257215,

  "avg_grad_rec": 974.3415750532008,

  "avg_colors_gt": 33150.611940298506,

  "avg_colors_rec": 48672.76119402985,

  "avg_brightness_bias": 0.012028095105649042,

  "avg_contrast_gain": 0.9334698915481567,

  "avg_r_shift": 0.01755926306984985,

  "avg_g_shift": -0.2827919103316407,

  "avg_b_shift": 0.002282704129370291

}

Flat Piece XL

{

  "avg_lpips": 0.0342872865539767,

  "avg_grad_gt": 404.0227391257215,

  "avg_grad_rec": 411.350503608362,

  "avg_colors_gt": 33150.611940298506,

  "avg_colors_rec": 37390.80597014925,

  "avg_brightness_bias": 0.0,

  "avg_contrast_gain": 1.0,

  "avg_r_shift": 0.0,

  "avg_g_shift": -0.09663446892553301,

  "avg_b_shift": 0.007696810199309196

}

Sharp Spectrum V1

{

  "avg_lpips": 0.03956091761199841,

  "avg_grad_gt": 404.0227391257215,

  "avg_grad_rec": 423.86651115986837,

  "avg_colors_gt": 33150.611940298506,

  "avg_colors_rec": 38281.17910447761,

  "avg_brightness_bias": 0.0,

  "avg_contrast_gain": 1.0,

  "avg_r_shift": 0.0026046240445115228,

  "avg_g_shift": -0.0969563887190463,

  "avg_b_shift": 0.0085162425712585

}

1