I made a comparison of different quantizations of the flux.1Dev model (gguf, nf4, fp8 and some random mx.gguf), I wanted to compare the generation speed,, the amount of required VRAM, and image quality dependence on quantization accuracy.
I understand that such a comparison is not very relevant, because the style and plot are similar, but at least it's something. The images have slightly different prompts, as well as different seeds, and the schedulers are also different (for the top row it's normal, for the bottom row it's simple). The rest is the same for both rows: resolution is 1024*1024, sampler - euler, 20 steps, guidance 3.5.
Images were generated on RTX 4070 Ti VRAM 12282 MB, RAM 65455 MB,
pytorch version: 2.5.0.dev20240809+cu124
comfi launched with --fast argument
Comparison jpeg here
And here is the graph of the generation speed that I got: