VAE Breakdown vs Base SDXL
Just a FYI two of the models listed here are my trainings. And by scientific metrics the VAE that is used the most is terrible. But the end goal of that training was to interact with SDXL in a very unique way.
Sometimes visual output is more important then what the machine sees. That is why folks like lizardon1024 that consistently test VAE's and publish at least one image work are important.
150 Images where compared per model for this VAE evaluation. The scores are averaged across those real world images. The same images where used for eval across all models.
Everything from here until the bottom note is GPT reviewing the metrics attached at the bottom.
BASE SDXL
Scores
LPIPS: 0.041
Gradient Retention: 359 vs 404 GT
Contrast Gain: 0.814
Green Shift: -0.272
Color Count: 39.9k
Characteristics
The baseline SDXL VAE behaves exactly like a conventional latent-compression decoder:
moderate texture smoothing
reduced contrast
muted chroma
good semantic preservation
Its strongest trait is balance:
perceptually faithful
relatively stable
low hallucination
low sharpening artifacts
Its main weakness is:
visible contrast flattening
green suppression
slight blur
Overall Profile
Category Rating
Perceptual Fidelity Very Good
Sharpness Moderate
Contrast Weak
Color Accuracy Moderate
Stability ExcellentHDR Effect
Scores
LPIPS: 0.055
Gradient Retention: 237
Contrast Gain: 0.923
Green Shift: -0.374
Blue Shift: +0.076
Characteristics
This VAE strongly compresses texture detail while boosting apparent dynamic range.
The numbers suggest:
aggressive smoothing
chroma manipulation
HDR-style tonal remapping
Key signatures:
very low gradients
high contrast retention
strong blue bias
severe green suppression
This likely produces:
cinematic HDR appearance
cleaner gradients
softer microtexture
cooler color balance
Overall Profile
Category Rating
Perceptual Fidelity Good
Sharpness Poor
Contrast Strong
Color Accuracy Weak
Stylization Very HighInterpretation
This is not a “neutral reconstruction” VAE.
It behaves more like:
a stylizing decoder
an HDR remapping decoder
a perceptual enhancement VAE
rather than a faithful latent reproducer.
SDXL Natural Skintone
Scores
LPIPS: 0.319
Gradient Retention: 974
Contrast Gain: 0.933
Color Count: 48.7k
Characteristics
This VAE is extremely aggressive.
The metrics indicate:
heavy edge enhancement
texture synthesis
sharpening artifacts
latent hallucination
The critical signal:
grad_rec = 974 vs 404 GTThat is enormous oversharpening.
Combined with:
huge LPIPS increase
exploding color count
elevated contrast
this is behaving less like a reconstruction VAE and more like:
a detail enhancer
a texture generator
a sharpening decoder
Overall Profile
Category Rating
Perceptual Fidelity Poor
Sharpness Extreme
Contrast Strong
Color Accuracy Moderate
Texture Hallucination Very HighInterpretation
This decoder is likely optimized for:
perceived skin detail
pore enhancement
local contrast
rather than reconstruction accuracy.
It may look “crisp” to humans while being mathematically far from the source image.
Flat Piece XL
Scores
LPIPS: 0.034
Gradient Retention: 411
Contrast Gain: 1.000
Brightness Bias: 0.000
Green Shift: -0.097
Characteristics
This is the strongest overall performer in your benchmark.
The metrics are exceptionally balanced:
lowest LPIPS
near-perfect gradient retention
exact contrast preservation
zero brightness drift
low color bias
This is unusually clean behavior for a VAE.
The decoder appears tuned for:
faithful reconstruction
tonal neutrality
minimal latent distortion
Overall Profile
Category Rating
Perceptual Fidelity Excellent
Sharpness Excellent
Contrast Excellent
Color Accuracy Very Good
Stability ExcellentInterpretation
This is the closest thing in your tests to a “transparent” VAE.
Its outputs are likely:
natural
stable
neutral
highly faithful
without obvious sharpening or stylization artifacts.
This is your strongest technical VAE overall.
Sharp Spectrum V1
Scores
LPIPS: 0.040
Gradient Retention: 424
Contrast Gain: 1.000
Green Shift: -0.097
Characteristics
This VAE behaves like a sharpened evolution of Flat Piece XL.
Compared to baseline SDXL:
better detail retention
preserved contrast
reduced blur
Compared to Flat Piece:
slightly more aggressive sharpening
slightly worse perceptual fidelity
stronger texture enhancement
The metrics suggest:
mild edge enhancement
controlled sharpening
high tonal stability
without crossing into hallucination territory.
Overall Profile
Category Rating
Perceptual Fidelity Very Good
Sharpness Excellent
Contrast Excellent
Color Accuracy Very Good
Stylization ModerateInterpretation
This may be the best “practical” VAE:
sharper than Flat Piece
much safer than Natural Skintone
more detailed than Base SDXL
while remaining mathematically stable.
Final Ranking
Most Faithful Reconstruction
Flat Piece XL
Sharp Spectrum V1
Base SDXL
HDR Effect
Natural Skintone
Sharpest Output
Natural Skintone
Sharp Spectrum V1
Flat Piece XL
Base SDXL
HDR Effect
Most Neutral Color Behavior
Flat Piece XL
Sharp Spectrum V1
Base SDXL
Natural Skintone
HDR Effect
Most Stylized
Natural Skintone
HDR Effect
Sharp Spectrum V1
Base SDXL
Flat Piece XL
Overall Conclusion
The metrics are internally coherent and scientifically believable, which strongly suggests your evaluation pipeline is functioning correctly.
Note: CHAT GPT Analyzed the following after reviewing my full workflow.
BASE SDXL
{
"avg_lpips": 0.04136230785455277,
"avg_grad_gt": 404.0227391257215,
"avg_grad_rec": 359.51499563188696,
"avg_colors_gt": 33150.611940298506,
"avg_colors_rec": 39910.0447761194,
"avg_brightness_bias": 0.0219295685155206,
"avg_contrast_gain": 0.8136377334594727,
"avg_r_shift": 0.024202517445769105,
"avg_g_shift": -0.2722856144406902,
"avg_b_shift": 0.0095405328719974
}
HDR Effect
{
"avg_lpips": 0.05460646381573891,
"avg_grad_gt": 404.0227391257215,
"avg_grad_rec": 237.00000592132113,
"avg_colors_gt": 33150.611940298506,
"avg_colors_rec": 37260.86567164179,
"avg_brightness_bias": 0.05311677081566026,
"avg_contrast_gain": 0.9232270121574402,
"avg_r_shift": 0.022212467781865773,
"avg_g_shift": -0.37442198707096613,
"avg_b_shift": 0.07591454711144985
}
SDXL Natural Skintone
{
"avg_lpips": 0.31924405504963294,
"avg_grad_gt": 404.0227391257215,
"avg_grad_rec": 974.3415750532008,
"avg_colors_gt": 33150.611940298506,
"avg_colors_rec": 48672.76119402985,
"avg_brightness_bias": 0.012028095105649042,
"avg_contrast_gain": 0.9334698915481567,
"avg_r_shift": 0.01755926306984985,
"avg_g_shift": -0.2827919103316407,
"avg_b_shift": 0.002282704129370291
}
Flat Piece XL
{
"avg_lpips": 0.0342872865539767,
"avg_grad_gt": 404.0227391257215,
"avg_grad_rec": 411.350503608362,
"avg_colors_gt": 33150.611940298506,
"avg_colors_rec": 37390.80597014925,
"avg_brightness_bias": 0.0,
"avg_contrast_gain": 1.0,
"avg_r_shift": 0.0,
"avg_g_shift": -0.09663446892553301,
"avg_b_shift": 0.007696810199309196
}
Sharp Spectrum V1
{
"avg_lpips": 0.03956091761199841,
"avg_grad_gt": 404.0227391257215,
"avg_grad_rec": 423.86651115986837,
"avg_colors_gt": 33150.611940298506,
"avg_colors_rec": 38281.17910447761,
"avg_brightness_bias": 0.0,
"avg_contrast_gain": 1.0,
"avg_r_shift": 0.0026046240445115228,
"avg_g_shift": -0.0969563887190463,
"avg_b_shift": 0.0085162425712585
}

