Cool It With The Negative Prompt Nonsense
What's this about?
Creators, it’s time. I hope it’s not already too late. When I look around though it sure seems like it. Well, I can always say I tried—that I fought the good fight. I've been talking about this since the dawn of SD. Maybe I'm a dreamer...
Hopefully you will read this and take it to heart. Stop filling up your negative prompt with cut-and-paste voodoo nonsense! You’re doing it even before there is any sign of a problem! You think it’s prophylactic, perhaps; that it’s preventing some set of problems before they even start, but you are wrong. We can't "prevent" something that wasn't going to happen. Even if cut-and-pasting a massive wall of text with a bunch of nonsense in it is your first step and it "works." Even though it "works for you." Pasting a Dostoyevsky-length pile of text into the negative prompt before you've hit Generate for the first time is cargo cult voodoo ritualism.
There are several important elements here in your own understanding of what you are doing—what you are asking the tools to do. “That’s not doing what you think it’s doing” is my go-to phrase for this sort of thing. Even if you like the result. Remember, I'm not saying you're wrong to pile that spew in there. What I am saying is that you don't need to do that. And it can make things easier for you to skip the everything-ever-disliked-in-any-image mountain of words insertion.
Yeah, sure, maybe it looks like it works. Well, see this rock on my desk? It keeps tigers away! Look, you don’t see any tigers around here, do you? See? The problem with this argument (or just this conclusion) is multi-faceted. You got a result that you like. But did it take the inclusion of that whole long negative prompt to do that? We will see that the answer is no.
Start Simple
A simple reason to not do it, is that you're adding more tokens. A lot more. So, without going any further into the technical reasons, an obvious problem is that you’re not easily able to see which token or tokens, these morsels of text, are causing the image you see to be like it came out. Therefore, it’s harder to adjust things if you do want to tweak things further.
Future You: "Hmm, was it 'bad hands' or 'bad anatomy' or 'deformed' that did this? She's supposed to have a tennis racket, her hands aren't even in the frame! I thought 'out of frame' would keep that from happening..."
You may find that the effectiveness of something you do need to add to that massive novella of negative text is reduced. And again, you won’t know quite why. So you start shuffling things around, you start putting (((parentheses))) or (numbers:1.99) on some of the words. But you're still guessing: Which of these tokens here in the negative prompt are interfering with these other tokens in the negative prompt and keeping it from doing what I want? Why doesn’t adding this or that word to my negative prompt not seem to work at all like I want, or when it does, it seems to manifest something else I don’t want?
Hey look, even if you, dear reader, are saying to yourself right now, “screw him, he’s not the king! This is how I do it and I’m happy with the result! My images are better than his anyway!” Fine, no worries, I can’t make you change how you're doing things. Go do things how you want, and if you like the results, that's what's important. I’m asking you to consider not doing it. To change your behavior in a very minor way. If nothing else, it makes the whole process simpler. There’s one less element to have to mess with when creating. Isn’t that something you’d want? Just a little bit less stuff to do?
Why am I so sure I’m right? Well, I’ve done some tests. Lots of tests. Lots of us have done lots of messing around generating huge grids showing small changes transforming our images bit by bit. So yes, that’s part of it. More importantly, it has to do with something I said earlier “It’s not doing what you think it’s doing.” It doesn’t (quite) work like that. You can hear that right from the mouth of one of the people that designed and coded the software! And for this article, I am not going to drown you in grids of images.
Yes, I can hear some of you thinking, “That doesn’t matter—I get the results I want; I don’t care how it works.” That sort of thinking makes me a tiny bit sad, but I understand. I don’t need to understand exactly how computer-controlled fuel injection works on my car. I just want to drive it. But isn’t it nice to know? Isn’t knowing how it works? Isn’t it nice to know what you’re doing?
Not knowing about modern engine improvements doesn’t cause me problems getting from here to the grocery store. Not knowing at least, a bit more about how this generative AI creates your images can hurt your ability to get the results you want, or make it take a lot more time for no good reason. If ignorance about car engines wound up with you arriving at Walmart instead of Whole Foods, I’ll bet you’d care!
OK, let’s take a look at some real examples. Let’s see that your negative prompt cut-and-paste spew is voodoo and that you don’t need it.
I’ve taken the generation parameters from some images that the creators of some models put in their listing to show what their models can do. From Civitai.
Nofilterman, fine, but get to the point.
OK Let's look at some images
Here’s what we’re going to do for our first exercise.
Create some images with the configurations posted—generally. Some things like which sampler or hires fix or Adetailer or no Adetailer, will be different. But the prompt content will be the same.
Then, we’re going to remove the entire negative prompt. All of it. Generate again and see what we get. After we’ve done that and compare the two, we will think about the differences, how big, how small, is the difference qualitatively or quantitatively important?
After that we will see what re-introducing something into the negative prompt does in terms of steering the image toward what we want. You know, we’ll use the negative prompt for its intended purpose!
We’ll be using photopediaXL_45 since it does a great job and many of the voodoo items in negative-prompt magic spells have to do with anatomy. Here is our first positive & negative prompt:
Prompt:
photoshoot, photo studio, RAW photo, editorial photograph, film stock photograph, cinematic, posing, beautiful lady, (freckles), big smile, blue eyes, short hair, dark makeup, hyperdetailed photography, soft light, head and shoulders portrait, cover, engulfed in shadow, silhouette,, dark atmosphere, ((focused on eyes)), feminine expressions, photography, 35mm, Nikon D850 film stock photograph, Kodak Portra 400 camera f1.6 lens, 8k, UHD
Negative prompt:
(worst quality, low quality:1.4), illustration, 3d, 2d, painting, cartoons, sketch, blur, blurry, blurred, bokeh, unclear, grainy, low resolution, downsampling, aliasing, dithering, distorted, jpeg artifacts, compression artifacts, overexposed, high-contrast, bad-contrast, poorly drawn, cropped, out of frame, (physically-abnormal:2), deformed, disfigured, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, dull eyes, mismatched eyes, bad anatomy, signature, watermark, artist name, text, error
Look at that negative prompt! It's longer than the damn prompt! OK, so some things in there look like they’d be helpful if we had a particular problem—especially things like “watermark.” If you make a lot of, um, nsfw images, you know the hassle of an otherwise excellent image with an alien blob of text on the screen with it. Here’s our result:
Nice! If you use photopediaXL you probably recognize this lovely lady, this is what I get with my settings in-all-their-twistedness in AUTOMATIC1111. It's the first image in the set the author has in the showcase. Well, it’s pretty close. I’ve got quite a few tweaked values for my settings in webui.sd so I’m not using “the exact same config” and won’t get “the exact same image.” At this point, I've messed with so much in there, it would take a fresh install to get any better than this. Thankfully we’re not trying to do that; we’re comparing generations with-and-without a phonebook sized negative prompt. Let’s see what it produces when I delete the entire negative prompt and generate it again.
Interesting! We see two things immediately. She’s younger, and she has quite a bit more freckles on her forehead. Still pretty, still the same color eyes, still smiling with the same nice teeth, that sort of thing. But wait—read that negative prompt. I don’t see anything about the age of this person at all in there. Nothing. Not even something that might be kind-of-sort-of-related to age.
So, dear reader. Take a moment, pause, look away from the screen and think for a minute about it. That negative prompt had nothing about age in it at all, but that’s the most obvious change when I removed the entire negative prompt.
You back? Good. Let's say, fairly enough, she's a bit young for what we intended. We will try a bit to deal with that. We want her to be a bit older. This image is good. It’s quite good, but it wasn’t quite what we intended. What should we do? Well, how about adding teenager to our empty negative prompt?
Look! One word. One single word in that negative prompt. And what would you intuitively put in there? Teenager. We don’t want her to be a teenager. And that did it. You can see that this is the very next image in the sequence I made to demonstrate it:
There’s the result when we: 1. removed the big negative prompt and then 2. just added the word teenager to deal with her being younger than we wanted. That’s what I want to stress. Take a look close-up. She doesn’t look exactly like the original. Her face is shaped a bit differently and to be fair she still looks a bit younger than the original.
And here’s another run later after messing around with unrelated things. With that big negative prompt:
With only the word teenager in the negative prompt:
Hmm. Interesting again! These two look rather different than the firs run I showed you, but me being me, I changed things (again) in the overall program settings since yesterday. These two images are different in some interesting ways. Let’s put them side-by-side.
To be fair, these two images are different qualitatively and quantitatively. I prefer the looks of the woman on the left. Looks more like what a girl I had a crush on in college turned out like a decade later. Yeah, I have good taste. Quantitatively the image on the right now has more pronounced contrast. The highlights are brighter, and the shadows are quite a bit darker. And come on, the image is actually more accurately following the positive prompt—go back and look at it, it’s got “engulfed in shadow, silhouette, dark atmosphere” right there in the prompt! Be honest with yourself. The 2nd image more accurately depicts what’s in the prompt, although we seem to have it fighting itself over the contrast and lighting mood information differences in the positive and negative prompts. Which is exactly one of the aspects of this I’m trying to show you.
But this could have different than your starting point, too. When we look at the prompts, the negative prompt we deleted had some words in there about lighting and contrast! Let’s add those back in and see if that helps. Our new negative prompt is simply:
“overexposed, high contrast, teenager.”
Hmm. Pretty darn nice. She can come over anytime she wants. Ahem...
Let’s look again at it side by side with the image that started us on this path—the one from Civitai, and the most recent image I have, where I start with no negative prompt and eventually just added “overexposed, high contrast, teenager” to correct what I wanted to correct: Yes, they're different. I've got a ton of surely different stuff under the hood and this wasn't about making an image exactly like one you got the parameters for from online!
Conclusion
In conclusion, let's look one more time at what I did with my images, my first image had that huge negative prompt, and the last one with just the few words. On the right is the original generation on my machine. On the left is the "final" (really only one step in the real world), that has only “overexposed, high contrast, teenager” in the negative prompt:
And there you have it. That's the point. I got the one on the left without a huge pile of negative text, the one on the right had that mountain of please-no-bad-hands words. Pretty close, yeah? And for me, the one I generated is um, a bit hotter. But everyone has their personal preferences... Pretty close is the point. And that’s with the absolute oddity of starting with an image prompt that says, “engulfed in shadow, silhouette, dark atmosphere...”
Dude... come on. Does either of the images above look at all like a silhouette? Is she “engulfed in shadow?” Would you say that description is what’s in the image? Think about it. I’ll wait. Nope, objectively not. So, what we see here and it’s another key point: I had to add negative prompt information to fight against positive prompt information that was not being properly generated in the image! (Probably because of the chaos created by that gargantuan voodoo negative prompt.)
For reference, here the simplified prompt and negative prompt:
photoshoot, photo studio, RAW photo, editorial photograph, film stock photograph, cinematic, posing, beautiful lady, (freckles), big smile, blue eyes, short hair, dark makeup, hyperdetailed photography, soft light, head and shoulders portrait, cover, engulfed in shadow, silhouette,, dark atmosphere, ((focused on eyes)), feminine expressions, photography, 35mm, Nikon D850 film stock photograph, Kodak Portra 400 camera f1.6 lens, 8k, UHD
Negative prompt: overexposed, high contrast, teenager
Yes, I was waiting to talk about this. My successful fight to suppress “engulfed in shadow” was just three words (that also make total sense) out of a huge list of things that weren’t problems in the image either way. I could continue, next just cleaning up the prompt more, removing things that don’t make sense. (Why would you have soft light and silhouette in there together? They’re practically opposites. Not quite antonyms but you get the point.)
In closing, do what you want. Do what works for you. But hopefully what you’ve just read will make you reconsider just cutting and pasting a wall of text into the negative prompt “just because,” or “it helps.” It doesn’t help. Not in the way you mean. It’s not doing what you think it’s doing. It can only provide real “help” if there is something there you want to remove. This prompt didn’t have any distorted hands, it wasn’t blurry, it didn’t have her out of the frame. There was no need to try to correct things that didn’t need correcting—because they weren’t happening! And then without reading these words here and thinking about it, confirmation bias might have had you thinking that it was helping—just like my magic rock that keeps tigers away.
Tip of the day: My go-to advice if you’re struggling to get a prompt to generate the sort of thing you want when it seems like it should: before you start making big additions to the negative prompt—just try a different seed. “Respect the seed,” is how I put it. Changing seeds can get rid of extra legs, weird bodies, insane faces—all that stuff by just rolling the dice. Click that die in Automatic1111 or put a –1 in the seed field. Some seeds just suck for some prompts. It’s not worth fighting that, in my opinion.
Now go forth, starting with an empty negative prompt!
Postscript
Because of the intermediate steps I took, and the intermediate steps and commentary, it may not be all that clear what the simple difference of with-and-without makes. Here are some additional images that have nothing modified in the config at all except the removal of the entire negative prompt.
With a big, long negative prompt:
Empty negative prompt:
Adding “teenager, teen, child” to our empty negative prompt:
Another example with the big negative prompt:
Empty negative prompt:
You don’t need that voodoo text in your negative prompt. I rest my case. That’s all for now, Cheers!