Sign In

UltraFlux VAE (Mirrored from Hugging Face Repo)

12

179

0

4

Verified:

SafeTensor

Type

VAE

Stats

179

0

Reviews

Published

Dec 22, 2025

Base Model

Flux.1 D

Hash

AutoV2
2BF9AD6856

The FLUX.1 [dev] Model is licensed by Black Forest Labs. Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs. Inc.

IN NO EVENT SHALL BLACK FOREST LABS, INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.

UltraFlux VAE is a high-performance Variational Autoencoder specifically engineered to solve the "softness" and memory bottlenecks associated with native 4K image generation in the Flux ecosystem. While standard Flux models typically utilize an F8 VAE that results in massive latent grids and slow processing times at high resolutions, UltraFlux adopts a more efficient F16 (16x downsampling) architecture. This significantly reduces the computational load and increases throughput, but because aggressive compression often risks losing micro-details, the developers implemented a specialized non-adversarial post-training scheme. During this phase, the decoder is fine-tuned on a massive 1-million-image 4K dataset using a unique SNR-Aware Huber Wavelet objective, which specifically prioritizes high-frequency fidelity. This allows the VAE to reconstruct sharp textures—such as hair, skin pores, and fine text—that are typically blurred by traditional compression methods. Beyond its technical efficiency, the UltraFlux VAE is designed as a "plug-and-play" solution for high-fidelity workflows like ComfyUI, where it is often used to instantly sharpen images without the need for time-consuming high-res fix steps or external upscalers. By shifting the heavy lifting to a more compressed F16 latent space while maintaining an ultra-detailed reconstruction through its fine-tuned decoder, it effectively bridges the gap between speed and perceptual quality. This makes it a cornerstone of the broader UltraFlux project, which aims to provide a unified framework for generating high-quality images across diverse aspect ratios (wide, square, and portrait) while matching or even exceeding the clarity of proprietary 4K models. UltraFlux VAE is a specialized Variational Autoencoder designed to enable high-fidelity, native 4K image generation within the Flux architecture. This VAE significantly improves the quality of your Flux and Z-Image Turbo outputs. I stumbled upon this when I was looking through stuffs on Hugging Face and after seeing how good this is I felt it should be known, used and praised by more folks so I mirrored it here on CivitAI.

While standard Flux & Z-Image Turbo models typically struggle at 4K resolution due to memory constraints and loss of detail, the UltraFlux VAE addresses these issues through several key innovations:

1. High-Resolution Optimization -

* F16 Compression: Unlike the standard Flux VAE (which often uses F8 downsampling), UltraFlux adopts an F16 VAE. This reduces the latent grid size by half (e.g., from 512×512 to 256×256), making the 4K generation process significantly faster and more memory-efficient.

* 4K Post-Training: To compensate for the aggressive F16 compression, the decoder underwent a non-adversarial post-training phase using a high-detail subset of the MultiAspect-4K-1M dataset (a corpus of 1 million 4K images).

2. Detail Preservation

* Wavelet Reconstruction Loss: The VAE was fine-tuned using a "wavelet loss" objective that specifically targets high-frequency information. This ensures that micro-details—such as skin texture, hair, and fine environmental elements—remain sharp during the decoding process.

* Micro-Contrast Enhancement: Users have noted that this VAE acts almost like an "unsharp mask" or a high-end sharpening filter, resolving softened details that standard VAEs might blur at high resolutions.

3. Practical Usage

* Plug-and-Play: It is often used as a standalone replacement for the standard Flux VAE in workflows like ComfyUI to instantly "sharpen" images without needing complex high-res fix steps.

* Native 4K Focus: It is part of the broader UltraFlux project, which co-designs the data, architecture (using Resonance 2D RoPE), and the VAE to maintain consistent quality across diverse aspect ratios (wide, tall, and square).

This is a reupload of resource from this Hugging Face Repo -

https://huggingface.co/Owen777/UltraFlux-v1/tree/main/vae