So, Flux is great with prompt adherence, right? Right…
but writing directions can be tricky for the model. How would Flux interpret “A full body man with a watch on his right wrist?”. It will most probably output a man, in front view, with the watch on his LEFT wrist, but positioned on the RIGHT side of the image. That’s not what we want.
"Full body shot of a man with a watch on his right wrist" 0 out of 2 here
Sometimes Flux gets it right, but often it doesn’t. And that’s mostly because of how we write our prompts.
A warning first: This is in no way perfect. Based on my experimentation, It helps, but it won’t be 100%.
Describing body parts using the character’s perspective (like “his left”) leads to confusion. Instead, it’s better to use the image’s perspective. For example, say “on the left side” instead of “his left.” Adding “side” helps the model a lot. You can also reference specific areas of the image like “on the left bottom corner”, “on the top-left corner”, “on the center”, “on the bottom”, of the image. Etc.
"Full body shot of a man with a watch on his wrist on the left side" 0.5 out of 2, getting there
NEVER use “his right X body part” ever. “On the left” is already way better than “on his left”, but still generates a lot of wrong perspectives. More recently I have been experimenting with taking “him/her” completely from the prompt and I think it is even better.
"Full body shot of a man with a watch on the wrist on the left side" 1 out of 2, better.
Another example would be:
"A warrior man from behind, climbing stepping up a stone. The leg on the left side is extended down, the leg on the right is bent at the knee. He is wearing a magical glowing green bracelet on the hand on the left side. The hand on the right side is holding the sword vertically upward. The background is the entrance of a magical dark cave, with multiple glowing red neon lights on the top-right side corner inside the cave resembling eyes."
Definitely not all is correct. But it's more consistent.
For side views, when both body parts are on the same side, you can use foreground and background to clarify:
A photo of man in side view wearing an orange tank top and green shorts. He is touching a brick wall arching, leaning forward to the left side. His hand on the background is up touching the wall on the left side. His hand in the foreground is hanging down on the left side.
This is way more inconsistent. It's a hit-and-miss most of the time.