Sign In

GGUF: HyperFlux 8-Steps (Flux.1 Dev + ByteDance HyperSD LoRA)

118
2.6k
27
Type
Checkpoint Merge
Stats
1,321
Reviews
Published
Sep 3, 2024
Base Model
Flux.1 D
Hash
AutoV2
E5804EB20B
default creator card background decoration
nakif0968's Avatar
nakif0968
The FLUX.1 [dev] Model is licensed by Black Forest Labs. Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs. Inc.
IN NO EVENT SHALL BLACK FOREST LABS, INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.

[Note: Unzip the download to get the GGUF. Civit doesn't support it natively, hence this workaround]

A merge of Flux.D with the 8-step HyperSD LoRA from ByteDance - turned into GGUF. As a result, you get an ultra-memory efficient and fast DEV (CFG sensitive) model that generates fully denoised images with just 8 steps while consuming ~6.2 GB VRAM (for the Q4_0 quant).

It can be used in ComfyUI with this custom node or with Forge UI. See https://github.com/lllyasviel/stable-diffusion-webui-forge/discussions/1050 to learn more about Forge UI GGUF support and also where to download the VAE, clip_l and t5xxl models.

Advantages Over FastFlux and Other Dev-Schnell Merges

  • Much better quality: you get much better quality and expressiveness at 8 steps compared to Schnell models like FastFlux

  • CFG/Guidance Sensitivity: Since this is a DEV model, unlike the Hybrid models, you get full (distilled) CFG sensitivity - i.e., you can control prompt sensitivity vs. creativity and softness vs. saturation.

  • Fully compatible with Dev LoRAs, better than the compatibility of Schnell models.

  • The only disadvantage: needs 8-step for best quality. But then, you'd probably try at least 8 steps for best results with Schnell anyway.

Which model should I download?

[Current situation: Using the updated Forge UI and Comfy UI (GGUF node) I can run Q8_0 on my 11GB 1080ti.]

Download the one that fits in your VRAM. The additional inference cost is quite small if the model fits in the GPU. Size order is Q4_0 < Q4_1 < Q5_0 < Q5_1 < Q8_0.

  • Q4_0 and Q4_1 should fit in 8 GB VRAM

  • Q5_0 and Q5_1 should fit in 11 GB VRAM

  • Q8_0 if you have more!

Note: With CPU offloading, you will be able to run a model even if doesn't fit in your VRAM.

All the license terms associated with Flux.1 Dev apply.

PS: Credit goes to ByteDance for the HyperSD Flux 8-steps LoRA which can be found at https://huggingface.co/ByteDance/Hyper-SD/tree/main