In general, it is more easy to make an AI model to draw something creative using prompts that are less strict about the scene. For example if you ask the AI to draw a scenic art of a mountain with water falls, colorful flowers and birds flying, the AI will produce some stunning pictures. But what if you want to tightly control the scene? While it is not yet possible to make an AI draw scenes like that there are somethings you are can do with carefully constructed prompts and high CFG value.
Consider a scene where four balls (blue, red, green and yellow) are arranged on a white floor. Let us try to describe to this scene to AI.
Prompt: Hyperrealistic photo of four balls on a white floor. blue, red, green, yellow.
CFG: 100%
Nice! The image shows exactly what we wanted. Four balls on a white floor. But will the AI produce a similar good result every time? Let us try again.
No! one of the balls is red instead of blue. But at least there only four balls. Also there seems to be no guarantee on how the balls are places together either. This is because we did not tell the AI how the balls should be positioned.
One way to tell the AI on how things are to be positioned is to use keywords such "next", "before", "behind", "ahead", "opposite". For ex if you say "(a blue ball), next to (a red ball)", the AI will most likely place a blue ball with a red ball relative to it. Exactly where would depend a lot on the camera angle.
Let us try this approach to position the four balls relative to each other.
Prompt: Hyperrealistic photo of four balls on a white floor. (blue), before (red), behind (green), opposite to (yellow).
CFG: 100%
Great! Four different colored balls. But are they positioned correctly? Yes there are. If you expected to see something different that is because we didn't specify any camera angle/position in the prompt, which gave the AI to choose one of many possible combinations that meets the prompt. In this case, starting from the blue ball, there is a red ball before it, a green ball behind it and a yellow ball opposite to it. This is very important. When you describe the position of the next object/subject in a scene, that position is "most always" relative to the other object(s) in the scene. We started with blue ball first and then said "before". If you see all the other three balls are in fact before the blue ball. So what about the red ball? Yes, other three balls are behind it.
So what we have so far is that there is still no guarantee that the AI will choose a combination that we have in our mind. Specifying a camera angle may help us to get a bit more control. Let us try that.
Prompt: Hyperrealistic front view of four balls on a white floor. (blue), before (red), behind (green), opposite to (yellow).
CFG: 100%
Now the result is more predictable! By saying "front view" we are asking the AI place the camera in the front from our view point. Then by describing the each ball from left to right, we are guiding the AI how to place the balls. Will this always produce the same result? May be not, because the AI is not good at this. But this approach will produce better results than writing lengthy prompts. Also notice that all the prompts above do not repeat the words to make that sure we most always get the only four balls in the picture.
Hope this is helpful. You have comments, suggestions or questions, please comment below.