Sign In

Kontext comparison bf16, q8, fp8_scaled

3

Kontext comparison bf16, q8, fp8_scaled

Hello, fellow latent explorers!

Flux Kontext is out and, as always, there are multiple weight types available out there. I did not see a single comparison between them and clashed countless times on reddit over quality of different types of base flux.

This time it is a bit different game: model is actually instruction based and can follow your instructions to work with images. From what I see in the published code and comfyui workflow work is done not by LLM there, but directly by unet. Which makes me wonder, will quantizations only affect quality of image or also quality of instruction execution?

Due to limited drive space and general messiness of workflows in comfy what gives you ability to compare different model I've settled with ForgeUI. In order to get maximum out of it I recommend also installing SageAttention for Forge, which can be a little bit tricky. There are guides on github, write in comments if you need condensed instructions.

Weights in current scope would be: BF16, FP8_scaled, Q8. Why? Because I have them on my drive. FP8_scaled is basically being promoted by comfy, Q8 is typical option that was way closer to original in Flux1d and BF16 is original for comparison.

Comfyui now has automatic offloading (like forge but better) that just works. If you have enough ram - just use BF16, speed degradation is not that noticeable. For Forge follow official guide on memory estimation.

As in my other comparisons I'll add link to image in the separate post. right click on it and open in browser to view at full resolution.

For generation parameters I used dpm++2m beta, 30 steps, 1024x1328. Also I use fp16 T5. I advise everyone to offload it to cpu, imo it takes longer to swap models then to generate embedding on cpu (if yours is not a full potato ofc).

This is important. I found that increasing steps from 20 to 30 greatly increases output quality on image editing.

Also, link for official prompt guide. Read it. Do not use retain in prompt.

Basically maintain facial features, scale and proportions are your friend.

Let's start with simple image generation.

Kontext can be used as pure t2i. It feels like flux, but easier on styles.

In my limited time with it if find generating image below 1MP or at odd formats to severely degrade output. I did not generate much at this low res in flux1d but I have a feeling that results were not that bad. Correct me if I'm wrong. For example Cover image at 848x400:

Vs 1168x736:

It just seems to give those odd artifacts every where.

Image in the style of comic art by Jack Kirby depicting a car chase action scene set in cyberpunk cityscape full of colors, skyscrapers and futuristic cars. image should have iconic visual artstyle full of Jack Kirby.

https://civitai.com/images/86880630

This is wht I was talking about. Way more stylized than I ever had with flux1d. Yet too vibrant for my liking. Regarding details - not much change. bf16 has more details. q8 kinda same image with less details. fp8 - slightly more different image, but not much.

Wait wait wait:

Well, that's some nightmare fuel from fp8

Now lets check fluxchin, because maybe?

Professional portrait photo of luscious woman in victorian era dress. Slight bokeh and studio lighting emphasize minute details of her perfect face: detailed skin texture, lush curled brown hair in updo hairstyle, long eyelashes and grey eyes looking at viewer. She is seated slightly turned towards camera. Her red embroidered dress in adorned with lace trim and frills, giving this image an overall luxurious vibe of professional photosession for instagram photo.

https://civitai.com/images/86883974

Well, fluxchin is here. Let's hope chroma will not have that. Same differences detail-wise. Just have a closer look at embroidery. What's interesting on las image quantized versions traded earring for a naughty lock.

Interesting part starts here

Before going in - just a firm reminder. When feeding image as latent to kontext you highly increase memory needed for inference. The bigger image - the more vram and slower. This is important for forge, with it's manual memory allocation. You can be fine with t2i, but feeding reference suddenly takes it to 155s/it. You leaked to shared memory with inference, tweak memory management. Also that's primary reason why kontext by default is resized to "BFL recommended resolution". That's just a bunch of different aspect ratios all around 1MP.

Let's take top left left image from last generation and tweak it. Here I used non default resolution and will not rescale that. That's not that much difference and it is normal in terms of pixel count for vae (divisible by 16)

Make her hold sign with text "YEP, THATS A FLUXCHIN" written in cursive. Maintain background and lighting.

https://civitai.com/images/86888541

Well, that's interesting both BF16 and Q8 failed at second seed, but fp8 was able to write a text.

Change her dress color to deep blue with sparkly texture.

https://civitai.com/images/86888541

Well, that's why you need to run it locally and include bunch of maintains. I found maintain background and lighting particularly useful. The only difference I see is that fp8 is slightly less sparkly on second seed (right):

Turn her to a man, but keep facial features the same. Add big cheeckbones and flamboyant mustache and bulky body. Maintain scale.

https://civitai.com/images/86894057

This turned out weirder than I expected. Changed eye color, haircolor, but overall face is consistent across images. Btw first seed seem to enforce that odd noisy background. My eye did not catch any significant differences except all 3 models leaving some leftovers on first seed:

Change style to grim dark illustration drawn by Frank Frazetta. Turn woman in red dress to female warrior in armored bikini revealing her midriff, with scenic landscape behind her. Her hands rest on the hilt of the sword. Maintain facial features, eye color and hair color. Image should be reminiscent of classic fantasy and science fiction illustrations. Incorporate bold brushstrokes, vivid colors, and a sense of energy and movement.

https://civitai.com/images/86899516

Slightly different details on fp8. Not much difference in amount of detail for them all.

Same character but in the 3 positions, front, side and back. full body against white background. Maintain eye color, hairstyle and facial features. Maintain scale and proportions.

https://civitai.com/images/86902038

Only last seed has ring on right hand. On first seed head is too big. Not much difference otherwise.

Turn this into PLAYBOY magazine cover. Background is now popping gradient yellow. Add some titles on the cover.

https://civitai.com/images/87064140

Another example that some seeds seem to be just broken. Original is pristine in all cases, fp8 and q8 follow same route.

Let's do something more tricky.

Turning anime to realistic, but with an image wit multiple characters.

Original: https://civitai.com/images/25318128

make it realistic

https://civitai.com/images/87069842

Squint your eeeeyes~

For reasons unknown I decided to give it a go without any resolution change, full 1824x1248 in and out. It was 2 times slower than prevous ones. Center characters are clarly change more. Other than that - no significant differences. Pretty mediocre result with a complex image. Couple of samefaces (well, understandable since anime).

Maybe some funky text edits with show something?

Original: https://civitai.com/images/30137875

Change text "YOUR TEXT" to "KONTEXT CAN?". Maintain style and composition.

https://civitai.com/images/87073259

Perfectly following the style. Text - so so. Interestingly now only Q8 produced coherent text in last seed. But yeah, you can take good stylized text and transform it to your needs using relatively simple prompt (heavy breathing in ideogram from behind). But honestly I was not able to pick significant differences outside of obviously wrong letter.

Lets recolorise really bad old photo, all in one prompt.

Colorize photo taken during Russo-Japanese war. Upscale the image, make it crisp. Remove old film artifacts like white dots and smears. Maintain facial features, scale and proportions. Photo is depicting japanese army marching through the village under powerlines. Soldiers are holding rifles on their shoulders. White feathers on blue hats with yellow stripe. Red stripes for officers.

https://civitai.com/images/87075681

Well... Kontext has no idea how Japanese soldiers in 1905 looked precisely, but at least unifor is blue. fp8 adds slightly weird tint over the tree, especially in first seed. But honestly it is not the main issue here. But overall the result is cool, because original is of a really bad quality.

Lets try style transfer.

Original: https://civitai.com/images/28123211

using this style create a portrait of indigenous american woman wearing tribal outfit.

https://civitai.com/images/87082432

Using this style, a torn plush bear is lying in the corner of delapitated old room]

https://civitai.com/images/87091555

Well, style transfer is better than ipadapters or redux if image is close to source. Outside of that it still lacking completely missing details and bringing in colors at best. My guess is that style is more about small fime details, and modern neural networks are about opposite. Interestingly, only fp8 was able to draw bear on first seed. Kinda

I tried a bunch of anime behind the scene - and no, it cannot, defaulting to generic flux anime at best.

Double image:

Originals:

https://civitai.com/images/39438594

https://civitai.com/images/38354362

Make girl from left image and boy from right image sit against each other at the table in cafe and eat ice cream peacefully. Maintain facial features, eye color, hairstyle and overall style.

https://civitai.com/images/87084908

It bleeds. And why realistic background? Not a single seed with proper female eyecolor. Luckily all that can be postprocessed using kontext. Outside of that weights show more variation here. And fp8 decided to put icecream int coffee mug on second seed for whatever reason.

Очки виртуальной реальности. HTC Vive

Make woman from left image wear vr headset from right image on her head. Maintain product details.

https://civitai.com/images/87089909

Interestingly only q8 was able to write htc on first seed. But it is garbage anyways. I don't see any difference outside of that.

The result is - no major losses in using q8 or fp8 models. Of course other than expected decrease of quality in smaller details.

3

Comments