Merge a Lora into Flux for better speed and quantize it

You get much faster generation speeds if you don't load Loras in your workflow. So if you've got the disk space then I recommend merging them instead.

Steps:

Merge your Lora into the original Flux 24GB .safetensors file
Convert 24GB .safetensors file to 24GB .gguf file
Quantize 24GB .gguf file to 12GB .gguf file (or whichever quant level you like)

Use this workflow to handle the first step: https://civitai.com/api/download/attachments/137148

It's a workflow you'll only run once, in order to save a model file. This model file will be a merge of the original uncompressed Flux dev (24GB file) and your chosen Lora. So be sure to have those files in the correct directories and select them in the workflow.

Run the workflow, wait for it to complete and check your output folder for the new file. You could stop here if your PC is capable of running this merged 24GB file at full speed, however even my RTX 3090 doesn't run it at full speed, instead I need to convert it into a 12GB quantized .GGUF file.

Here's how to do that:

If you're on Windows you can use this simple tool I put together: https://github.com/rainlizard/EasyQuantizationGUI/releases
It'll convert the .safetensors file to .gguf and quantize it with one click.
Linux users will instead have to follow the instructions here to quantize: https://github.com/city96/ComfyUI-GGUF/tree/main/tools

Enjoy your faster generation speeds!