Prompting in Stable Diffusion

After almost a year of working with Stable Diffusion, I have experimented with numerous approaches to prompting and have arrived at what I believe to be the correct approach. Let's take a look at it in details.

The Power of Words:

The words we use in prompting are called tokens. Each token has its own power, which depends on its frequency of occurrence in the dataset. Without diving into the technical details of the CLIP tokenizer, you can refer to this article if you're interested in learning more about how it works from the inside.

Key Considerations: The phrase "Game art of a beautiful girl" consists of 6 individual tokens (11 if we count spaces, but that's not important). Following this logic, we understand that particles like "of" and "a" also carry weight and influence the image. From this, we can infer that neural networks are machines, and it's not necessary to use formal English language when inputting prompts. Sometimes, using a prompt like "Game art, beautiful girl" may yield better results. This doesn't mean we should completely exclude all particles, but I recommend using them only when necessary, based on the sentence's logic.

In addition to a token's inherent strength, its position in the prompt is also weighted. Tokens at the beginning have greater weight than tokens at the end. It's important to understand this, as a weak token at the end of the prompt may have no impact on the image. Conversely, a strong token at the beginning can completely determine the outcome.

To control the strength of a token, you can use the construction (token:1.0), where the number represents the strength of the token. 0 - no influence, 1 - normal weight. I usually don't go past 1.5. Experimenting with different strength values can help you fine-tune the desired level of control over the tokens in your prompts.

Prompt Structure:

Content Type | Object | Style | Extra | Color

Example Prompt:

Minimalism Icon of fantasy magic book in [Painting Low poly:Vector illustration:0.3] style, <lora:Pecha_Icons_LORA_V4:0.8>, best quality, Trending on Artstation, masterpiece, dramatic lighting, red hue

Content Type: (Minimalism Icon)

Here, we describe the type of art we want to create. It's important to specify the art type at the beginning, as it influences all subsequent details.

In general, content type can be broadly divided into two main categories: Photography and Art. These two categories serve as the foundation, and all other subcategories can be considered as branches stemming from them. Photography encompasses various styles such as landscape, portrait, still life, and more, focusing on capturing real-world subjects through the lens. On the other hand, Art includes a wide range of artistic creations, including paintings, illustrations, digital art, sculptures, and other forms of visual expression. These two main groups provide a framework for organizing and classifying different types of visual content.

Object (fantasy magic book)

This is the second and most important block. Here, we describe the objects we want to see in our art. Additionally, object stylization can be used.

If you need a specific object, you can try describing it in detail. For example, fantasy book with golden ornament on the cover and rivets. However, it's important to remember that the neural network is an excellent fantasizer, and sometimes it can generate objects that you couldn't even imagine. So personally, I prefer to specify only the main details and give the neural network room for creativity.

Style [Painting Low poly:Vector illustration:0.3]

In this block, we determine the artistic style of our art. In general, there are numerous styles in art. I suggest using them consciously, highlighting the advantages of each style, and using them as needed. For example, Low poly will create angular objects with minimal detailing, yet they won't appear flat. Cartoon will provide exaggerated forms and vibrant colors. I have compiled the main styles discovered during my work and conducted thorough testing. I recommend doing the same, recording the seed and testing different styles using the XYZ Plot to understand how they affect the same image.

This block significantly influences the final quality of the art, and it allows me to achieve highly detailed images. Since the topic of style is extensive, I plan to write a separate article dedicated entirely to this subject.

This is an excellent place to describe additional enhancements for your art. This block is relatively far from the beginning of the prompt, and tokens here have lower strength, but they still contribute to the image. You can include information about lighting in the scene, improvements, quality, mood, trends on different platforms, camera information, film type and put you LoRAs here. This is a good area for experimentation. Over time, you will have your own collection of tokens that can be placed here. As a start, I would recommend going through the pictures you like on Civit and see what people use in such cases.

Color: (red hue)

Here, we describe the color tones we want to see in our art. Why do we place color information at the very end? Because the model has been trained on a vast amount of visual information, and imagine how many images with the color red are present in the dataset? A lot... As I mentioned earlier, tokens have their power, and a token describing color is one of the strongest. If we place color information at the beginning, the model may simply fill everything with one color, and we'll lose many details.

Here's an example with the same seed. In the first image, color information is placed at the beginning, and we can see that we lose some details and additional colors. In the second image, color is placed at the very end, but the token is still strong enough to draw a red book. At the same time, we get more details and additional colors.

In fact, the main section of this structure is only the Object. Everything else can be skipped if not needed. This usually happens in well-trained models where there is no need for Style and Extra.

Summary:

Tokens in the prompt have their power based on their frequency of occurrence in the dataset.
The correct selection and placement of tokens in the prompt can significantly impact the quality and level of detail in the final image.
The structure of the prompt, including content type, objects, style, additional details, and color, plays an important role in shaping the end result.
Experimenting with different tokens and their combinations allows achieving the desired artistic effect.
Optimal utilization of styles and experimentation with them contribute to obtaining more detailed and engaging images.
Proper placement of color information in the prompt allows preserving detail and avoiding an image dominated by a single color.
Additional tokens can be used to enhance the artwork, add information about lighting, quality, and other aspects.

Thank you for your attention! More articles you can find on my Patreon