Hardware config:
CPU: 12th Gen Intel(R) Core(TM) i5-12400F
Total RAM 32628 MB
Device: cuda:0 NVIDIA GeForce RTX 3060
Total VRAM 8192 MB
System config:
Platform: Windows 11
NVIDIA Driver: 560.94
Cuda compilation tools (Cuda Toolkit), release 12.6, V12.6.20
Python version: 3.11.9
pytorch version: 2.3.1+cu121
ComfyUI config:
ComfyUI Revision: 2568 [b29b3b86]
Set vram state to: NORMAL_VRAM
Model to test:
black-forest-labs/FLUX.1-dev
Kijai/flux-fp8 (fp8_e4m3fn) (fp8 diffusion)
Comfy-Org/flux1-dev (fp8 checkpoint)
lllyasviel/flux1-dev-bnb-nf4 (v1 & v2)
city96/FLUX.1-dev-gguf (Q4 & Q5)
LoRA to test:
Many creators train their LoRAs using different methods, tools, and parameters. I tried to pick 3 popular LoRAs to test their speed and compatibility.
Enhanced LoRA:
https://civitai.com/models/631986/xlabs-flux-realism-lora?modelVersionId=706528
Style LoRA:
https://civitai.com/models/128568/cyberpunk-anime-style?modelVersionId=747534
Celebrity LoRA:
https://civitai.com/models/121544/natalie-portman-sdxl-flux?modelVersionId=735449
* Attached is the workflow used for testing.
flux1-dev:
↓ First execution, including model loading time and CLIP text encoding.
100%|█████████████████████████████████████████████████████████| 20/20 [02:48<00:00, 8.45s/it]
Prompt executed in 287.57 seconds
↓ Execute again without reloading the model.
100%|█████████████████████████████████████████████████████████| 20/20 [02:28<00:00, 7.42s/it]
Prompt executed in 153.39 seconds
↓ flux1-dev + xlabs-flux-realism-lora.
100%|█████████████████████████████████████████████████████████| 20/20 [02:46<00:00, 8.35s/it]
Prompt executed in 187.76 seconds
↓ flux1-dev + cyberpunk-anime-style.
100%|█████████████████████████████████████████████████████████| 20/20 [03:17<00:00, 9.87s/it]
Prompt executed in 208.24 seconds
↓ flux1-dev + natalie-portman-sdxl-flux.
100%|█████████████████████████████████████████████████████████| 20/20 [03:18<00:00, 9.92s/it]
Prompt executed in 239.02 seconds
flux1-dev-fp8 diffusion model:
↓ First execution, including model loading time and CLIP text encoding.
100%|█████████████████████████████████████████████████████████| 20/20 [01:57<00:00, 5.88s/it]
Prompt executed in 186.65 seconds
↓ Execute again without reloading the model.
100%|█████████████████████████████████████████████████████████| 20/20 [02:00<00:00, 6.02s/it]
Prompt executed in 123.75 seconds
↓ flux1-dev-fp8 diffusion + xlabs-flux-realism-lora.
100%|█████████████████████████████████████████████████████████| 20/20 [01:59<00:00, 5.98s/it]
Prompt executed in 134.85 seconds
↓ flux1-dev-fp8 diffusion + cyberpunk-anime-style.
100%|█████████████████████████████████████████████████████████| 20/20 [02:23<00:00, 7.18s/it]
Prompt executed in 169.46 seconds
↓ flux1-dev-fp8 diffusion + natalie-portman-sdxl-flux.
100%|█████████████████████████████████████████████████████████| 20/20 [07:52<00:00, 23.60s/it]
Prompt executed in 498.94 seconds
flux1-dev-fp8 checkpoint model:
↓ First execution, including model loading time and CLIP text encoding.
100%|█████████████████████████████████████████████████████████| 20/20 [02:03<00:00, 6.17s/it]
Prompt executed in 195.66 seconds
↓ Execute again without reloading the model.
100%|█████████████████████████████████████████████████████████| 20/20 [01:58<00:00, 5.91s/it]
Prompt executed in 122.51 seconds
↓ flux1-dev-fp8 checkpoint model + xlabs-flux-realism-lora.
100%|█████████████████████████████████████████████████████████| 20/20 [02:07<00:00, 6.39s/it]
Prompt executed in 144.82 seconds
↓ flux1-dev-fp8 checkpoint model + cyberpunk-anime-style.
100%|█████████████████████████████████████████████████████████| 20/20 [04:12<00:00, 12.63s/it]
Prompt executed in 280.27 seconds
↓ flux1-dev-fp8 checkpoint model + natalie-portman-sdxl-flux.
100%|█████████████████████████████████████████████████████████| 20/20 [18:06<00:00, 54.35s/it]
Prompt executed in 1111.82 seconds
flux1-dev-bnb-nf4:
↓ First execution, including model loading time and CLIP text encoding.
100%|█████████████████████████████████████████████████████████| 20/20 [01:37<00:00, 4.89s/it]
Prompt executed in 342.23 seconds
↓ Execute again without reloading the model.
100%|█████████████████████████████████████████████████████████| 20/20 [01:37<00:00, 4.88s/it]
Prompt executed in 103.47 seconds
↓ flux1-dev-bnb-nf4 + xlabs-flux-realism-lora:
!!! Exception during processing !!! .to() does not accept copy argument
↓ flux1-dev-bnb-nf4 + cyberpunk-anime-style:
!!! Exception during processing !!! .to() does not accept copy argument
↓ flux1-dev-bnb-nf4 + natalie-portman-sdxl-flux:
!!! Exception during processing !!! .to() does not accept copy argument
flux1-dev-bnb-nf4-v2:
↓ First execution, including model loading time and CLIP text encoding.
100%|█████████████████████████████████████████████████████████| 20/20 [09:25<00:00, 28.27s/it]
Prompt executed in 611.85 seconds
↓ Execute again without reloading the model.
100%|█████████████████████████████████████████████████████████| 20/20 [05:41<00:00, 17.06s/it]
Prompt executed in 348.14 seconds
↓ flux1-dev-bnb-nf4-v2 + xlabs-flux-realism-lora:
!!! Exception during processing !!! .to() does not accept copy argument
↓ flux1-dev-bnb-nf4-v2 + cyberpunk-anime-style:
!!! Exception during processing !!! .to() does not accept copy argument
↓ flux1-dev-bnb-nf4-v2 + natalie-portman-sdxl-flux:
!!! Exception during processing !!! .to() does not accept copy argument
flux1-dev-Q4_0:
↓ First execution, including model loading time and CLIP text encoding.
100%|█████████████████████████████████████████████████████████| 20/20 [02:04<00:00, 6.22s/it]
Prompt executed in 157.69 seconds
↓ Execute again without reloading the model.
100%|█████████████████████████████████████████████████████████| 20/20 [02:03<00:00, 6.15s/it]
Prompt executed in 127.29 seconds
↓ flux1-dev-Q4_0 + xlabs-flux-realism-lora.
100%|█████████████████████████████████████████████████████████| 20/20 [02:05<00:00, 6.29s/it]
Prompt executed in 137.08 seconds
↓ flux1-dev-Q4_0 + cyberpunk-anime-style.
100%|█████████████████████████████████████████████████████████| 20/20 [02:46<00:00, 8.33s/it]
Prompt executed in 176.57 seconds
↓ flux1-dev-Q4_0 + natalie-portman-sdxl-flux.
100%|█████████████████████████████████████████████████████████| 20/20 [03:05<00:00, 9.27s/it]
Prompt executed in 194.95 seconds
flux1-dev-Q5_0:
↓ First execution, including model loading time and CLIP text encoding.
100%|█████████████████████████████████████████████████████████| 20/20 [02:45<00:00, 8.30s/it]
Prompt executed in 196.78 seconds
↓ Execute again without reloading the model.
100%|█████████████████████████████████████████████████████████| 20/20 [02:37<00:00, 7.87s/it]
Prompt executed in 162.54 seconds
↓ flux1-dev-Q5_0 + xlabs-flux-realism-lora.
100%|█████████████████████████████████████████████████████████| 20/20 [02:49<00:00, 8.47s/it]
Prompt executed in 182.82 seconds
↓ flux1-dev-Q5_0 + cyberpunk-anime-style.
100%|█████████████████████████████████████████████████████████| 20/20 [06:54<00:00, 20.73s/it]
Prompt executed in 425.56 seconds
↓ flux1-dev-Q5_0 + natalie-portman-sdxl-flux.
100%|████████████████████████████████████████████████████████| 20/20 [06:37<00:00, 19.86s/it]
Prompt executed in 407.55 seconds
↓ Comparison of models:
↓ Models with xlabs-flux-realism-lora
↓ Models with cyberpunk-anime-style. Bad hands appear.
↓ Models with natalie-portman-sdxl-flux. Extra arms appear.
↓ The time required for each model under different circumstances.
↓ Charts for easier reading.
Fastest continuously generated model:
flux1-dev-bnb-nf4: 103.47 seconds
fp8 diffusion, fp8 checkpoint, Q4_0: 120~129 seconds
flux1-dev: 153.39 seconds
Q5_0: 162.54 seconds
flux1-dev-bnb-nf4_v2: 348.14 seconds
Shortest loading time (1st execution - 2nd execution):
Q4_0, Q5_0: 30~50 seconds
fp8 diffusion, fp8 checkpoint: 51~100 seconds
flux1-dev: 101~200 seconds
flux1-dev-bnb-nf4, flux1-dev-bnb-nf4_v2: More than 201 seconds
LoRAs:
xlabs-flux-realism-lora can be used on basically any model
natalie-portman-sdxl-flux works best on flux1-dev and Q4_0
cyberpunk-anime-style is suitable for flux1-dev, fp8 diffusion and Q4_0
If you often use LoRA, it will be more stable to use flux1-dev and Q4_0. Using fp8 diffusion may be slower in some cases.
The flux1-dev-fp8 checkpoint and flux1-dev-Q5_0 are generally slower when using LoRA.
The slowness of flux1-dev-Q5_0 may be caused by insufficient VRAM.
Until this test, ComfyUI did not support flux1-dev-bnb-nf4 and flux1-dev-bnb-nf4_v2 using LoRA.
Maybe celebrity LoRA requires more details, so it consumes more resources.
Controlnet to test:
XLabs-AI/flux-controlnet-collections/flux-canny-controlnet_v2.safetensors
↓ flux1-dev, strength 0.7
Sampling: 100%|███████████████████████████████████████████████| 25/25 [05:04<00:00, 12.19s/it]
Prompt executed in 356.17 seconds
↓ flux1-dev-fp8, strength 0.7
Sampling: 100%|███████████████████████████████████████████████| 20/20 [10:59<00:00, 32.97s/it]
Prompt executed in 680.51 seconds
XLabs-AI/flux-controlnet-collections/flux-canny-controlnet.safetensors
↓ flux1-dev-fp8, strength 0.9
Sampling: 100%|███████████████████████████████████████████████| 20/20 [11:29<00:00, 34.49s/it]
Prompt executed in 720.66 seconds
The first test uses xLabs-AI’s preset workflow. I gave up testing XLabs Sampler when I saw its speed. I'll wait for other methods.