Sign In

qwenimageVAE_liquid1127

Updated: Jun 29, 2026

assetsanimefurry

Download

1 variant available

bf16 SafeTensor

qwenimageVAE_liquid1127.safetensors

BF16, good balance • 242.05 MB

Verified:

Type

VAE

Stats

183

Reviews

Published

Jun 29, 2026

Base Model

Krea 2

Hash

AutoV2
54650BD81D
default creator card background decoration
Followers - 1533

1.5K

Downloads - 178669

178.7K

Generations - 3164471

3.2M

Maintenance Mode Badge May 2023
VAE_compare_1024x1152_00058_.png

qwenimageVAE_liquid1127

qwenimageVAE_liquid1127 is a lightly tuned Qwen Image VAE based on qwenimageVAE_liquid1087. It was adjusted with Krea2-style image generation in mind, but it can also be used with other Qwen Image compatible models, such as Anima and other Qwen-based workflows.

This VAE was tuned with a focus on:

  • stronger and cleaner black line rendering

  • slightly warmer pale-orange tonal gradation

  • preserving color richness without making the image look oversaturated

  • reducing unwanted pink / magenta fringing around high-contrast edges

  • maintaining brightness, contrast, and smooth reconstruction quality

  • improving stability for detailed anime-style illustrations and semi-realistic outputs

The goal of this VAE is not to drastically change the model’s style, but to subtly improve how decoded images handle linework, color boundaries, warm highlights, and edge artifacts.

This VAE is intended for Krea2 / Qwen Image based workflows, especially when generating:

  • detailed anime-style illustrations

  • character portraits with fine linework

  • images with orange, red, pink, purple, and blue color transitions

  • high-contrast lineart

  • images where magenta edge halos or pink fringing are a problem

  • outputs where the original VAE feels slightly color-thin or weak in black line definition

Tuning notes

This VAE was tuned using reconstruction-based losses, not a full style-training approach.

The purpose of the tuning was to make small decoder-side numerical adjustments while preserving the original behavior of qwenimageVAE_liquid1087 as much as possible.
The encoder was frozen, and only a very small decoder-tail portion was updated.

In this training setup, the trainable parameters were limited to decoder output-tail modules such as:

decoder.conv_out
decoder.norm_out
decoder.head
decoder.conv2

This means the VAE was not broadly retrained.
Instead, only the final output-side part of the decoder was adjusted so that line clarity, luminance balance, local contrast, color separation, and magenta-fringe behavior could be slightly corrected without changing the VAE too aggressively.

Training setup

The main training settings were:

RESOLUTION = 640
BATCH_SIZE = 1
MAX_STEPS = 2000
SAVE_EVERY = 200

LR = 1e-8
WEIGHT_DECAY = 0.0
GRAD_CLIP = 0.03

TRAIN_DECODER_ONLY = True
TRAIN_TAIL_ONLY = True
MIXED_PRECISION = "bf16"
LATENT_CLAMP = 3.0

The VAE weights themselves were kept in float32 during training for stability, while bf16 autocast was used for the forward pass.
The encoder was executed under no_grad, and decoder-only optimization was used to reduce VRAM usage and avoid destabilizing the latent distribution.

Loss weights

The final loss was a weighted combination of several reconstruction and image-quality terms:

W_RECON_L1 = 1.00
W_MULTI_SCALE = 0.14
W_EDGE = 0.10
W_LUMA = 0.14
W_CONTRAST = 0.10
W_COLOR_RICHNESS = 0.02
W_ANTI_PINK = 0.16
W_KL = 1e-6

The actual total loss was calculated as:

total_loss = (
    W_RECON_L1 * loss_recon
    + W_MULTI_SCALE * loss_ms
    + W_EDGE * loss_edge
    + W_LUMA * loss_luma
    + W_CONTRAST * loss_contrast
    + W_COLOR_RICHNESS * loss_color_richness
    + W_ANTI_PINK * loss_anti_pink
    + W_KL * loss_kl
)

Since this was decoder-only tuning with a frozen encoder, the KL term was effectively kept as a placeholder rather than being used as a major training objective.

Reconstruction loss

A standard L1 reconstruction loss was used as the main anchor:

loss_recon = F.l1_loss(recon_f, target_f)

This keeps the decoded image close to the original training image and prevents the VAE from drifting too far from the base VAE behavior.

Multi-scale reconstruction loss

A multi-scale L1 loss was used to preserve both small details and larger tonal structures.

def multiscale_l1(recon, target):
    loss = F.l1_loss(recon, target)
    r = recon
    t = target
    for _ in range(3):
        r = F.avg_pool2d(r, kernel_size=2, stride=2)
        t = F.avg_pool2d(t, kernel_size=2, stride=2)
        loss = loss + F.l1_loss(r, t)
    return loss / 4.0

This helps the VAE match not only pixel-level details, but also broader gradients, soft shading, and overall tonal structure.

Edge preservation loss

A Sobel-based edge loss was used to improve line stability and high-frequency detail reconstruction.

edge_r = sobel_mag(y_r)
edge_t = sobel_mag(y_t)
loss_edge = F.l1_loss(edge_r, edge_t * EDGE_TARGET_GAIN)

The edge target was slightly strengthened:

EDGE_TARGET_GAIN = 1.03

This was intended to make black lines, eyelashes, hair strands, clothing folds, and other fine illustration details slightly cleaner without making the result overly sharp.

Luminance preservation loss

A luminance loss was used to reduce unwanted brightness shifts.

def luminance(x):
    return 0.299 * x[:, 0:1] + 0.587 * x[:, 1:2] + 0.114 * x[:, 2:3]

loss_luma = F.l1_loss(y_r, y_t)

This helps keep the tuned VAE from making images too dark, too bright, or overly contrast-heavy compared with the source reconstruction.

Local contrast loss

A local contrast/statistical contrast term was used to prevent the image from becoming too flat.

def contrast_stat(y):
    return y.flatten(2).std(dim=2)

loss_contrast = F.l1_loss(contrast_stat(y_r), contrast_stat(y_t))

This helps preserve separation between shadows, highlights, and lineart.

Color richness loss

A small color-richness loss was used to preserve useful chroma separation.

def chroma_richness_stat(x):
    mean = x.mean(dim=1, keepdim=True)
    chroma = (x - mean).pow(2).mean(dim=1, keepdim=True).sqrt()
    return chroma.mean(dim=(2,3))

The target chroma richness was slightly increased:

COLOR_RICHNESS_TARGET_GAIN = 1.03
rich_r = chroma_richness_stat(recon_f)
rich_t = chroma_richness_stat(target_f) * COLOR_RICHNESS_TARGET_GAIN
loss_color_richness = torch.relu(rich_t - rich_r).mean()

This was not meant to simply oversaturate the image.
The goal was to prevent the decoded result from becoming color-thin, especially in red, orange, purple, blue, and pale warm gradients.

Anti-pink / anti-magenta fringe loss

A custom anti-fringe loss was used to suppress unwanted magenta or pink halos around high-contrast edges.

def anti_pink_fringe_loss(recon, target):
    y_t = luminance(target)
    e_t = sobel_mag(y_t)
    thr = e_t.mean(dim=(2,3), keepdim=True) * 1.20
    edge_mask = (e_t > thr).float()

    r = recon[:, 0:1]
    g = recon[:, 1:2]
    b = recon[:, 2:3]

    magenta_excess = torch.relu((0.5 * (r + b)) - g - 0.03)
    return (magenta_excess * edge_mask).mean()

This loss only focuses on edge regions detected from the target image.
It penalizes cases where red and blue become too strong compared with green near edges, which often appears visually as pink or magenta fringing.

The intention was not to remove normal pink, purple, or red colors from the image.
Instead, it was designed to reduce unwanted magenta-tinted edge artifacts while preserving intentional warm and colorful areas.

Stability measures

Several safety measures were used to avoid NaN collapse and unstable VAE behavior:

MAX_BAD_STEPS = 200
LATENT_CLAMP = 3.0
GRAD_CLIP = 0.03
optimizer = torch.optim.AdamW(trainable, lr=LR, weight_decay=WEIGHT_DECAY, eps=1e-4)

Latents were clamped before decoding:

latents = torch.nan_to_num(
    latents,
    nan=0.0,
    posinf=LATENT_CLAMP,
    neginf=-LATENT_CLAMP
).clamp(-LATENT_CLAMP, LATENT_CLAMP)

Decoded images were also sanitized before loss calculation:

recon_f = torch.nan_to_num(
    recon_f,
    nan=0.0,
    posinf=1.0,
    neginf=-1.0
).clamp(-2.0, 2.0)

If non-finite gradients were detected, that step was skipped instead of updating the weights.

Key format and saving

The base VAE used ComfyUI/WebUI-style Qwen Image VAE key names.
For training, the weights were loaded into diffusers.AutoencoderKLQwenImage using a shape-and-order based key mapping.

After training, the model was saved back in the original ComfyUI/WebUI Qwen Image VAE safetensors key format, so it can be used directly in ComfyUI-style workflows.

The saved weights were exported as bfloat16 safetensors.

Notes

Because this is a subtle VAE tuning, the difference may be easiest to see in side-by-side comparisons using the same seed, same model, same prompt, and same latent image, changing only the VAE.

Look especially at:

  • black line clarity

  • hair strand separation

  • eye highlights

  • edge halos

  • red / orange / purple color boundaries

  • smoothness of skin and fabric gradients

  • whether fine details become muddy or remain readable

Disclaimer

This VAE is an experimental fine-tuned VAE for visual comparison and workflow testing.
Results may vary depending on the base model, sampler, prompt, LoRA, and image style.