Short vs Long Prompts

I generally recommend keeping prompts as short as possible. However, long prompts do have their advantages, some of which I explore here.

"Forest, night" is what I would call a minimalist prompt. You should only use a prompt that short if you can trust your setup to give you the essence of what you are after or in order to test a setup or embedding. I primarily use such prompts ("glamour photo of {woman}") to see what kind of "idea" (my setup of) SD has of a particular concept.

Next, there are succinct prompts. If you have a fairly precise knowledge of what you're after, this prompt length is most likely to deliver it.

Finally, there are elaborate evocative prompts. These are long prompts that contain a lot of information about subject, setting, mood and various details. By analogy, you essentially give SD an entire novel and expect to get stills from the movie adaptation in return.

The best thing about elaborate evocative prompts is that, in the age of ChatGPT, they are fast and easy to generate and allow you to get some stunning results without the effort of trying to figure out any of the details. The downside is that the resulting images become difficult or next to impossible to tweak. I also believe that well-engineered succinct prompts tend to beat elaborate evocative prompts in output quality, because the signal-to-noise ratio is so much better.

In fact, a good way for short and long prompts to work together is to use a long prompt to get a feel for what kind of image you want, then go through that prompt and pick only high-quality and relevant information to use as the final succinct prompt (I try this in the third example below).

Let's look at some examples.

Here is a picture I found on Pixiv by a user named Utajazz (reproduced here for the fair-use purpose of criticism and commentary):

I love this picture. They are so cute and joyful. Here is the prompt used to generate it:

As a street photographer, you aim your camera at a group of high school girls on a picnic trip to an amusement park. You capture their youthful energy and exuberance as they race from one ride to the next, laughing and screaming with delight.

Their hair is styled in a variety of ways, from long braids to high ponytails, and their outfits are a mix of casual wear and school uniforms. Some of them are wearing plaid skirts and blouses, while others are dressed in shorts and t-shirts.

Their footwear ranges from sneakers to sandals to dress shoes, and some of them have even donned colorful stockings and knee-high socks to complete their playful looks.

Against the backdrop of the amusement park, you capture images of them riding roller coasters, eating cotton candy, and trying their luck at the game booths. You also snap candid shots of them chatting and laughing with each other, enjoying the freedom and camaraderie of their high school years.

Overall, you capture a glimpse of their youthful spirit and carefree joy, frozen in time through your lens.

This makes me smile almost as much as the picture. SD captures the hairstyles (long braids and ponytail), the casual wear (plaid skirts), the "chatting and laughing with each other" and the "backdrop of the amusement park". Most importantly it "captures their youthful energy and exuberance, their youthful spirit and carefree joy".

What about racing from one ride to the next, riding roller coasters, eating cotton candy, and trying their luck at the game booths? SD will often ignore even essential aspects of very short prompts. Also, depicting so many activities in one picture would (ironically) destroy its "carefree spirit" by creating some weird collage (not to mention that it is usually not very good at "activities").

But it's worth asking what purpose "you capture images of them riding roller coasters, eating cotton candy, and trying their luck at the game booths" serves here. If you generate several images, some will probably depict some of those activities. In that case you are prompting a batch or series more than a single image. Or perhaps it just helps enhance the overall mood. But for that purpose, is it really going to work better than "joyful \(mood\)"?

If you want some picture of joyful girls at an amusement park, this prompt works beautifully. If you need to get more specific (or want to change specific details), you will have the harder a time the longer the prompt gets.

Let me illustrate this. Let's call in the Space Marines.

(All of the following examples use the following settings in addition to the positive prompt:


<lora:add_detail_v5:0.7> <lora:FilmG2:0.5><lora:wowifierV3:0.1><lora:detailmaker:0.1><lora:more_details:0.1>

Negative prompt: photograph by bad-artist, kkw-Extreme-Neg-PH, ng_deepnegative_v1_75t, bad_prompt_version2, negative_hand-neg, easynegative, (worst quality, low quality, normal quality:1.7), lowres, lr:0.9

Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Model: endlessreality_v2)

First Example: Space Marines

Succinct Prompt (49 Tokens):

a squad of Adeptus Astartes attacked by a horde of Chaos Cultists, blasted wasteland, battlecruisers and fighters In the skies above, (cinematic bokeh, dynamic range, vibrant colors), (shot on Kodak Vision3 IMAX)

Elaborate Evocative Prompt (436 Tokens, 312 words):

The planet was a blasted wasteland, scarred by centuries of war and conflict. In the skies above, great battlecruisers and fighters fought and died, their engines roaring and their guns blazing.

On the ground, a squad of Space Marines made their way through the ruins of a once-great city, their armor gleaming and their bolters at the ready. They were the Angels of Death, the mighty warriors of the Adeptus Astartes, and they had been sent to this planet to crush the rebellion that had risen against the Emperor.

The squad was led by Captain Gabriel, a fierce and noble warrior who had fought countless battles in the name of the Emperor. As they advanced through the ruins, they were attacked by a horde of Chaos Cultists, their twisted bodies writhing with dark and ancient power.

Captain Gabriel and his Space Marines fought with all their might, their bolters and chainswords cutting a bloody swath through the enemy ranks. They fought with the skill and determination of true heroes, and soon the Cultists lay dead at their feet.

But the battle was not over yet. From the shadows, a greater threat emerged: a towering Chaos Lord, its body twisted and corrupted by the power of the Warp. The Chaos Lord attacked with all its might, its blades and spells raining down upon the Space Marines.

Captain Gabriel knew that this was a battle he could not win alone. He called upon the power of the Emperor, and a bright light shone from his armor. The Chaos Lord was driven back, and the Space Marines emerged victorious.

The rebellion was crushed, and the Emperor's will was restored to the planet. Captain Gabriel and his Space Marines stood victorious, their armor stained with the blood of their enemies. They had fought and died for the Emperor, and they were true heroes of the Imperium.

Ok, so neither of them feature "chaos cultists" (although the second set has hints of chaos creatures). The long prompt creates more diversity of compositions. In my opinion, the short prompt depicts the subject matter ("Adeptus Astartes") somewhat more accurately. I know I'm biased (towards short prompts), but I think the first set looks cooler.

Now, what if, for some reason, I need them to wear red armor? Let's add "(in red amor:1.2)" after the first mention of "Adeptus Astartes" or "Space Marines" and see what happens:

Red armor, very similar composition. This is somewhat lucky. There is a large chance SD will change the composition even when mildly changing a short prompt.

Very similar composition, no red armor. "(In red amor:1.2)" might as well not be there. It just gets drowned out by all the other information.

Second example: Sword and Sorcery heroine

Succinct Prompt (31 Tokens):

female warrior, sword and sorcery, dark and dangerous \(mood\),

(cinematic bokeh, dynamic range, vibrant colors), (shot on Kodak Vision3 IMAX)

Elaborate Evocative Prompt (284 Tokens, 222 words):

The world was dark and dangerous, filled with monsters and demons that threatened to destroy all that was good and pure. But there was one hero who stood against the darkness, wielding her sword with skill and courage. She was Aela, the fierce and powerful warrior, and she was the only hope of the world.

Aela journeyed across the land, facing all manner of dangers and challenges. She fought dragons and giants, demons and sorcerers, always emerging victorious through her strength and her bravery.

But her greatest challenge came when she faced the dark lord Morok, the ruler of the underworld. Morok was a powerful and malevolent being, and he sought to enslave the world and plunge it into eternal darkness.

Aela knew that she must defeat Morok, no matter the cost. She gathered her strength and her courage, and she faced the dark lord in a final and epic battle. The battle raged for days, and many lives were lost on both sides.

In the end, Aela emerged victorious, her sword glowing with the power of good. She had defeated Morok and saved the world from his evil, and she was hailed as a hero by all who knew her. And so she continued her journey, ever ready to face the darkness and defend the world from all that threatened it.

Again, the longer prompt has greater diversity. Which pictures are "better" may be a matter of personal preference. I would say the second set has cooler compositions, while the first set has some better details.

What if we want to change the hair color to blonde? Let's add "(blonde hair:1.2)" after "warrior" or "Aela", respectively.

Blonde hair. Composition (even facial features) quite similar.

Blonde hair. Composition (even facial features) quite similar. This might be because 284 Tokens (unlike 436 Tokens) isn't long enough to bury "(blonde hair:1.2)". This length might well be a sweet spot for those who want both diversity and control.

On the other hand, I think the second short story also kind of sucks. Apart from the fight against Morok (which of course doesn't get depicted anywhere), there is little of relevance for SD to work with. I suspect this makes this "medium long" prompt effectively even shorter than it appears.

Third example: Xianxia initiation

Elaborate Evocative Prompt (418 Tokens, 338 words):

In a far-off land, a young warrior named Kaiya was on a quest to become the greatest fighter in the kingdom. She had trained since she was a child in the art of the sword, and had fought many battles and defeated many foes.

One day, Kaiya came across a mysterious old man who was sitting on a mountain trail. The old man had long white hair and a beard, and he was dressed in simple robes. He looked at Kaiya with piercing eyes, and said, "You seek to become the greatest fighter in the kingdom, do you not?"

Kaiya nodded, and the old man smiled. "I can help you on your quest," he said. "I can teach you the secrets of the Xianxia, the ancient art of harnessing the power of the elements to enhance your fighting skills."

Kaiya was intrigued, and she asked the old man how she could learn the Xianxia. The old man replied, "You must undergo a trial, and if you pass, I will teach you the secrets of the Xianxia."

Kaiya accepted the challenge, and the old man led her to a cave at the top of the mountain. Inside the cave, Kaiya faced a series of challenges and tests, each more difficult than the last. She fought powerful beasts and overcame dangerous obstacles, and finally, after many days and nights, she emerged victorious from the cave.

The old man was impressed, and he praised Kaiya for her skill and determination. He then taught her the secrets of the Xianxia, and Kaiya learned how to harness the power of the elements to enhance her fighting abilities. She became even stronger and more powerful, and she continued on her quest to become the greatest fighter in the kingdom.

As she traveled from one land to another, Kaiya fought many battles and defeated many foes. She became known as the greatest warrior in the kingdom, and her name was feared and respected by all. But despite her success, Kaiya never forgot the old man.

I did these first and was impressed with how well they capture the essence of the story. I then tried to get similar results using succinct prompts. I ended up interrogating the pictures with CLIP and using Multidiffusion ("over here I want this, over there I want this") for the later images.

Succinct Prompt (54 Tokens):

(man and woman sitting on a rock facing each other), with a view of the mountains behind them,(mythical ancient chinese warrior initiation:1.5), An Zhengwen, cinematic photography, neo-romanticism (cinematic bokeh, dynamic range, vibrant colors), (shot on Kodak Vision3 IMAX)

AND (old chinese man, long white hair, flowing robes:1.2)

AND young chinese woman, kneeling

Again, I think the results are better overall, but they also took more time and effort to achieve.

As mentioned in the beginning, this is how I intend to use long prompts in the future: Get a feel for what I want, and then develop a succinct shorter prompt to actually get it.

Your goals and preferences will vary, and so feel free to use the information in this post to your advantage as you see fit.

