I have said this before in my other articles. Don't write paragraphs to describe a a scene to an AI. Short and sweet is the best! If you practice this skill, then you can make the AI to create the scene you want without having try multiple times. Someone commented in response to another article I wrote saying that it is possible to get a scene right by trying different random seeds. Yes sure, you can! But that is like winning a PowerBall lottery.
Here is the general format I use for describing scene.
<quality, overall description of the scene> [:|of] [character1, attributes, action], [<relative position>|a][character2, attributes, action].
Let us try few simple examples first. Scenes involving a single character and for the sake of simplicity I am going to pick popular action heroe scenes.
Scene #1 - Hulk standing on a busy New York city road.
Hyperrealistic photoshoot: a (hulk-standing-on-road), behind a (out-of-focus-new-york-city)
There are important aspects to notice here. By saying "photoshoot", we are not only informing the AI that we want a good quality image, but it also opens the opportunity for us to describe about the focus. Then we say "hulk-standing-on-road" to describe what the character is doing and where is he placed. Then we say "out-of-focus-new-york-city" to indicate that is he standing on a busy road and the view of the city should be out of focus, thereby giving Hulk the main focus. By saying "behind a" we are saying that hulk is at the front of the camera.
Scene #2 - Batman sitting a building roof top looking down on a busy street.
Hyperrealistic photoshoot: a (batman-seated-on-rooftop, looking-down), a (busy-new-york-city-street, below, out-of-focus)
This prompt is similar to the previous, but shows how actions and attributes can be spelled separately. The prompt "batman-seated-on-rooftop, looking-down" could have been written as "batman-looking-down-seated-from-rooftop". The prompt "busy-new-york-city-street, below, out-of-focus" could have been written as "out-of-focus-busy-new-york-city-street-below". But sometimes we may want to split these prompts especially when want to add other attributes like we will be doing in the next scene.
Scene #3 - Spiderman in busy New York city.
Hyperrealistic photoshoot: a (spiderman-jumping-from-rooftop), a (busy-new-york-city-street, below, out-of-focus)
Again similar scene, but few important things to call out here. By saying "jumping-from-rooftop" we are placing spider man far above. By saying "busy-new-york-city-street, below, out-of-focus" we are making sure that a busy street is seen below and out of focus. If you are curious, then try changing the prompt to "out-of-focus-busy-new-york-city-street-below" and see what happens.
Lastly a slightly more complex scene with three different camera focuses.
Hyperrealistic photoshoot: a (superman-flying-high, main-focus), a (busy-new-york-city-street, below, out-of-focus), a (rocket-falling-down, in-focus)
Notice how we are instructing the AI to give three different focuses here. We want the busy New York city street in the view, but out of focus. We want super man to be the main focus and then we want that rocket to be in focus as well.
What about SD 1.5?
Well, all these were created using an SDXL model. If you are interested how to create these scenes using an SD 1.5 based model, then please let me know.
Hope you found this useful. Do you have tips, suggestions or questions? Please drop a comment below.