Sign In
Block Weight Tests for Flux Dev

Full size comparison images can be found here, since the article refuses to show the full sized images.

I noticed that my Flux LoRAs had some issues. There was a bit of over-fitting, which I'm okay with, if the faces or styles are accurate. However, they were also exhibiting more frequent issues with hands and text. Having seen that someone on Reddit had only used two weight blocks to train a person (see Sandor below), I was curious if it could really be a solution to training issues.

I have tested styles, hands and faces, mostly using the Flux presets in the Inspire Pack's "LoRA Loader (Block Weight)".

All the models that I have uploaded and tested here have a Kohya_ss training scripts attached on the model page.

I would like to test more specific blocks, but there are way too many combinations to do if done naively. Also, I would like to test camera angles, poses, hair, body shape and fonts later, if it appears to be worth the effort.

Here's a good resource on skipping blocks of Flux Dev itself.

TL;DL

These are my observations from limited tests.

Keep in mind that isolating parts may fail to reflect behavior of a part when all used together.

  • DBL-FRONT7: "Neutralize"? (Becoming more photorealistic, higher contrast and more detailed.) Composition?

  • DBL-MID6: Faces. Style?

  • DBL-TAIL6: Details for texture, style and skin. Hands? Foreground and background details? JPEG compression? Pose? Composition?

  • SINGLE-1to10: Hands? (Sign of the Horns) (Otherwise, had minor impact at best.)

  • SINGLE-11to20: ? (Barely had an impact.)

ComfyUI

Unet Loader (GGUF)

  • unet: Flux 1 Dev (GGUF Q4_0)

Dual Clip Loader

  • clip 1: clip-vit-large-patch14

  • clip 2: t5xxl_fp16

LoRA Loader (Block Weight) - Inspire Pack

  • NONE: bypass node

  • DBL-FRONT7: 1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0

  • DBL-MID6: 1,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0

  • DBL-TAIL6: 1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1

  • SINGLE-1to10: 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

  • SINGLE-11-20: 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

  • SINGLE-21-30: 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0

  • SINGLE-31-37: 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1

  • ALL: 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1

Example Image with Embedded ComfyUI Workflow

Styles

J.C. Leyendecker

I regretted using "cartoon" as part of the trigger word, but it does help make the effects of the LoRA more obvious. It is unreleased because I think the dataset was a little too diverse.

Context

  • Image Count: 29

  • Captions: Prefix Trigger Phrase + Detailed Captions (Florence2, manually rewritten)

  • Prefix Trigger Phrase: Cartoon illustration by J.C. Leyendecker of

  • Prompt 1 (used same prompt as caption in training data, yes, I know there's a typo): Cartoon illustration by J.C. Leyendecker of a young man sitting on a stone pedestal playing a banjo in his hands. He has short blonde hair and is wearing a beige suit, black and yellow stripped tie, black stockings and brown leather shoes. He is looking to the side and has a smile on his face. The background is orange with a shadow. Light-skinned male. The framing is a straight-on, full body shot from the front.

  • Prompt 2: A cartoon illustration by J.C. Leyendecker of a man lifting weights. He body is muscular and he is wearing a baggy t shirt. He is looking down, unaware of the viewer.

Notes

  • DBL-FRONT7: Isn't it really strange that going from no layers to just the DBL-FRONT7 layers causes the style to change from cartoon to photo-realistic?

  • DBL-MID6: This is clearly where the style's faces were primarily learned. Maybe some style too? Note that, for whatever reason, in the banjo example the shadow disappeared DBL-MID7 test and when the whole LoRA is active.

  • DBL-TAIL6: Details. Texture?

  • SINGLE: SINGLE-1to10 and SINGLE-11to20 learned something...

Gil Elvgren

Since his painting style is generally realistic, the differences are a bit more subtle. However, in the test I did, I included "high-resolution image", which may have biased the output.

Content

  • Image Count: 220

  • Captions: Trigger Phrase

  • Trigger Phrase: Gil Elvgren

  • Prompt: A realistic, high-resolution image by Gil Elvgren. The central figure is a white haired elf woman with long, pointy ears. She has straight-cut bangs and her hair is straight and long. She is wearing a conservative white blouse and a long white skirt and is kneeling at the base of a gnarled tree with one hand on the ground. There are pink flowers on a tree branch in the foreground, suggesting it is spring. Her outfit is in the style of 1970s fantasy. All around her are a dense collection of flowers and plants.

Notes

  • DBL-FRONT7: More photo-realistic and detailed (e.g. more clothing wrinkles, cleft-chin, more contrast in the lighting, more tiny flowers in the foreground).

  • DBL-MID6: Face (Gil Elvgren had a bit of same-face syndrome). The hands more suggestive maybe?

  • DBL-TAIL6: pose, foreground and background greenery, JPEG compression

  • SINGLE: 1to10 and 11to20 learned something... 1to10 Seem to change the face and the lighting looks slightly more painterly.

Franklin Booth

This one is interesting since it is relatively simple. However, I don't know why the details turned out muddy. When I trained it on SD XL, details were much more clear, although, to be fair, it was overall less faithful to the style. Maybe it is learning jpeg artifacts at this quality?

Context

  • Image Count: 87

  • Captions: Trigger Phrase

  • Trigger Phrase: "detailed pen-and-ink illustration by Franklin Booth"

  • Prompt: Detailed pen-and-ink illustration by Franklin Booth of a man and a woman standing together on top of a grassy hill overlooking a city below. There are billowing clouds in the sky.

Notes

  • DBL-FRONT7: More contrast and clearer detail. The sky is shaded in the same way as the full LoRA.

  • DBL-MID6: Outlines and stronger shading/contrast. Rocks appeared in the grass where there were darker patches in DBL-FRONT7. The color of the image seems to be off-white like when the LoRA is not used.

  • DBL-TAIL6: More photorealistic than DBL-MID6, but this detail matches his style well all on it's own. However, like with the full LoRA weights, the details are muddy; forms were more clear in DBL-MID6. The color is very black and white.

  • SINGLE: SINGLE-1to10 and SINGLE-11to20 learned something with stronger outlines.

Hands

Sign of the Horns

I didn't train this model, but I thought it was perfect for testing, as LoRA training might cause the model to forget how to render hands and feet.

Context

  • Image Count: 16 (13 repeats)

  • Captions: Trigger Word + Danbooru Tags

  • Trigger Word: hud_m3tal_h4ndsv2

  • Prompt 1: A high-resolution photograph of a man. He is raising his left hand in a heavy metal rock and roll hand gesture of devil horns, hud_m3tal_h4nds sign with the index and pinkie finger extended upwards, while the middle and ring fingers are held down by the thumb.

  • Prompt 2: photo of a woman with long brown hair, wearing a sweater, baggy jeans, leather jacket, she is making a heavy metal rock and roll hand gesture of devil horns, hud_m3tal_h4nds sign with the index and pinkie finger extended upwards, while the middle and ring fingers are held down by the thumb. There is a cinematic element, at a concert

Notes

  • DBL-FRONT7: In the example with the man, his fingernail is missing, he's slightly more out of focus, the horns on his head are slightly darker and he now has a suggestively "metal" wrist band...

  • DBL-MID6: In the example with the man, his fingernail is missing and he's slightly more out of focus. In the example with the woman, her ring is gone and her face is in slightly darker shadow.

  • DBL-TAIL6: I think this is the strongest layer group. However, it is inconsistent and less distinct without SINGLE-1to10.

  • SINGLE: SINGLE-1to10 seems to be pretty important here.

People

I saved these for last as testing facial accuracy can be hard. In my understanding, the triangle between the eyes and the mouth is where most of the important differences are for identifying people, and in particular, the nose (see super-recognizers). I am not focusing on bodies, as they are usually less diverse.

George Clinton

This is a model I trained, so I know how it works. It was mostly trained on cropped faces, but there are a few images of his body too. Half of the dataset was black and white. It should be mentioned that Flux was likely trained on some images of him, although likely they'd be of him when he was older and heavier.

Context

  • Image Count: 28

  • Captions: Trigger Phrase

  • Trigger Phrase: George Clinton

  • Prompt: A high-resolution photograph of George Clinton. He has a short beard, short mustache and short curly black hair. He's wearing red lipstick on his face, purple eyeshadow and has a fake mole on his left cheek. He is wearing a formal black and yellow checkered suit, a light yellow dress shirt and a black tie with white polka-dots. He is holding a card that has the bold red text "WOOF!". His head is tilted slightly and his eyes are wide, as if he is crazy.

Notes

  • DBL-FRONT7: It looks more realistic than without, but it doesn't really look like him at all.

  • DBL-MID6: His face is a bit "waxy" and "off", but I think his nose is more accurate than DBL-TAIL6. I highly suspect it is because I used some older upscaling AI to fix many of the images of him I cobbled together. I think you could argue if you saw the dataset, that these middle layers over-fitted on the training data. Anyways, the wings of his nose and the base of the bridge of his nose are the correct width. The eyebrow shape, how the beard attaches to the cheeks and lips look right to me. The only "issue" is that the head feels a bit long.

  • DBL-TAIL6: Skin detail and tone look great, but it looks off to me. The face is too round, beard cut too perfect, nostrils not quite upturned enough, base of the bridge of the nose not quite wide enough and and the wings too wide for the slight smile.

  • SINGLE: SINGLE-1to10 and SINGLE-11to20 learned something...

Sandor Clegane

I didn't train this, but I thought it would be interesting to compare. I only am showing the layers that seemed to be different. I don't really know why SINGLE7 and SINGLE20 seemed to do nothing when activated, as they were listed in the LoRA info node, but only DBL-FRONT7-2 and DBL-FRONT7-3 had any impact.

-------[Single blocks(DiT)] (2, Subs=2)-------
SINGLE7: 1
SINGLE20: 1
-------[Base blocks] (0, Subs=0)-------

Context

  • Image Count: 10

  • Captions: ?

  • Trigger Phrase: Sandor Clegane

  • Prompt: Sandor Clegane working at homedepot as a cashier with a forced smile, close up, marketing material

Notes

  • DBL-FRONT7-2: This is mostly correct. The tip of the nose is a bit rounder than the final.

  • DBL-FRONT7-3: Looks less correct. Reminds me of the all block trained LoRAs, where the DBL-FRONT7 group becomes photorealistic.

  • ALL: The hair line is a bit smoother. (Is this more or less accurate? Is this just random variation?)

12

Comments