Most of the articles about writing prompts to describe a scene to an AI model, especially an SDXL model that supports inline weights are outdated!. These articles typically suggest that the prompt should repeat words to emphasize something. For example if you are trying to create an image of a puzzle art, then the typical suggestion would be is to repeat the word puzzle many times. But this will only confuse a SDXL model. Also, when you describe anything to an AI model, you have to understand how the AI is processing your prompt.
Let us use a simple scene to discuss this further. Say you are trying to create a puzzle art of Medusa standing with a roman castle behind her. How would you describe it? Oh another thing folks suggest is to ask another AI like Chat GPT. Let us as Chat GPT!
"Write a text prompt to describe to a generative AI to create a puzzle art of Medusa standing with a roman castle behind her"
Chat GPT: "Generate puzzle art featuring Medusa in a standing pose with a Roman castle in the background. Capture the mystique of Medusa's gaze and the grandeur of the castle. Puzzle art, mythical, ancient Rome, enigmatic."
Ok cool! Let us try that! Here is a sample image produced by Starlight XL for that prompt.
Looks Cool! But where is the puzzle art? Shouldn't the AI have created an art made of out of puzzle pieces, especially after repeating the keyword "puzzle" couple for times. No! because with this prompt we only made the AI model "feel puzzled".
Ok let us try to rephrase the prompt: "(Realistic art made of pieces of puzzle). (a medusa, majestic, standing), behind (a roman castle, majestic).". BTW those brackets tell an AI that the words within are related. Recent versions of SDXL models support adding a weight. But we will discuss about that in another article. For now, let us keep it simple.
When compared to the previous long text prompt, the AI understood that we wanted to puzzle art with a roman castle in the background. But why is the entire picture not made out of puzzle pieces? This where is trying to understand how an AI reads the prompt helps. We talked about wanting a puzzle art, and then we said we want medusa in standing pose. But then after that part we only said that background of the art should be a roman castle. Specifically we did not tell the AI that the entire art is made of puzzle pieces.
May be the comma before the "behind" is confusing the AI. Now let us try to remove that: "(Realistic art made of pieces of puzzle). (a medusa, majestic, standing) behind (a roman castle, majestic)." And here is what the AI produced.
Yay! finally a puzzle art! But wait where is the roman castle? Now let us try to rephrase that prompt a bit different: "(Realistic art made of pieces of puzzle). (a medusa, majestic, standing) with (a roman castle, majestic, behind)."
Finally! we are seeing a puzzle art of Medusa standing before a castle. Of course, we could further refine this. At least I expected that the image will contain only a puzzle art and a rectangular one at that. But my prompt did not say that anywhere. So obviously we gave the AI some room to use it's own imagination. May be this is exactly what you want or may be you want the AI to draw image exactly the way you want. We will explore that in the next article.
Conclusion:
1. Long text prompts, especially prompts with repeated keywords is definitely not the best way to talk to a SDXL based AI art generator. Instead a list of simple keywords is the best.
2. The order of the keywords matter.
3. Trying to describe the position the way you see is not how the AI will see it. Notice that in the final prompt we said "behind" and not the Medusa is standing "before the castle." If you try saying "before" then the AI would think that Medusa should be facing the castle and you will most likely get an image of Medusa's back before a castle in the background. Hope this helps. Do you have other cool tips, suggestions or questions? Please comment or send me a message.