santa hat
deerdeer nosedeer glow
Sign In

[Insights for Intermediates] - How to craft the images you want with A1111

[Insights for Intermediates] - How to craft the images you want with A1111

Introduction

Who is this guide for?

This guide is for the Stable Diffusion user who has a firm grasp of the basics, but is sometimes struggles translate the images they see in their head, to pixels on a screen. This guide is for the user who doesnt understand why their well thought out, detailed prompt doesn't produce the desired results.

Once again, this guide will not focus on inpanting, img2img, controlnet, or any of the other cool things you can do with Stable Diffusion. If you have interest in those topics or are looking for a guide specifically for them, I highly recommend A13JM's guide - Making Images Great Again!

Please feel free to DM me with any questions or comments, and I will happily respond. I enjoy answering questions!


Im now tracking updates at the bottom of the page if you are interested to see what changed.

Assumptions

This guide will assume you have some degree of basic familiarity with Automatic1111, and generating Stable Diffusion images. If you are a brand new beginner, this guide is not for you. I recommend you read my previous guide, or at least give it the once over to make sure you know everything in it, as this guide builds on those concepts.

This guide also assumes that you understand the basics of prompt weights. Such that,

blue eyes = (blue eyes:1.0) 

(blue eyes) = (blue eyes:1.1)

((blue eyes)) = (blue eyes:1.21)

(((blue eyes))) = (blue eyes:1.331)

This guide considers the multiple parentheses style to be outdated and discouraged as it offers less control than using numerical weights, and will therefore not be used.

Content Advisory

This guide uses some lewd and suggestive images to keep readers engaged, however there is nothing sexually explicit in this guide. If it were a movie, it would be PG-13. That being said, while this guide doesn't exactly fall into NSFW territory, its does flirt with that line, so maybe don't read this during your lunch break at the office.

Automatic1111 Lobe Theme

If my Stable Diffusion looks different than yours, its likely because I am using a skin extension called sd-webui-lobe-theme, which can be found like any other extension on the extensions tab. Highly recommended.

Yeah, Im not reading this whole thing. Just tell me what I need to know.

Each section has final paragraph summary that starts with Insight. Just read that paragraph for each section that interests you.

Models Used

This guide will use some popular, but slightly lesser known models that I enjoy using. Here they are as a reference, in order of appearance:

DarkSushi2.5 V3

Colorful V3.1

StingerMix V3.2

3DAnimationDiffusion V1

epiC2.5D V1

RestlessExistence V2 - Shameless plug for my own model. Checkout RestlessExistence V3!

Real time image generation preview

Before we get into anything else, I want you to change a setting in your Automatic1111 Stable Diffusion Webui.

Go to the "Live Previews" section, and change the "Live preview display period" to 1, or 2.

Then change the "Progressbar and preview update period" to 50 milliseconds.

If you have a good GPU like RTX 3070, you can set this to a value even lower like 10ms.

If you have a decent GPU like RTX 2060 TI, try setting the delay to 300-500 ms, and change the live preview display to something like 5.

If you have a low power GPU like a RTX 1060, I would recommend against this step.

Apply your settings. You should not need to restart the UI.

Let's go back to txt2img. If you have it downloaded, select darkSushi25D25D_v30 and and paste in this text into the positive prompt box:

(medium shot:1.4), 1girl, sultry, seductive, risque, long hair, braids, braided, hair up in ponytail, blonde hair, (blue eyes:0.75), (tavern wench:1.3), (cleavage:1.2), (bustier:1.1), (full skirt:1.1), (medieval tavern:1.3), flirty, smirk, tease, playful expression, smug, smirk, (looking at viewer:1.4), standing, hands on hips, hands on waist, leaning forward, (masterpiece:1.5), (high quality:1.4), subsurface scattering, heavy shadow, (intricate, high detail:1.2)
Negative prompt: (worst quality:2), (low quality:2), (normal quality:2), (b&w:1.2), (black and white:1.2), (blurry:1.2), (out of frame:1.2), (ugly:1.2), (cross-eyed:1.2), (disfigured:1.2),(deformed:1.2), (extra limbs:1.2), (b&w:1.2), (blurry:1.2), (duplicate:1.2), (morbid:1.2), (mutilated:1.2), (poorly drawn hands:1.2), (poorly drawn face:1.2), (mutation:1.2), (deformed:1.2), (bad anatomy:1.2), (bad proportions:1.2), (watermark:1.5), (text:1.5), (logo:1.5)
Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1701800230, Size: 512x768, Model hash: 54d8a62d61, Model: darkSushi25D25D_v30, Denoising strength: 0.4, Clip skip: 2, Hires upscale: 2, Hires steps: 15, Hires upscaler: 4x-UltraSharp, Version: v1.5.1

Now click the down&left arrow underneath Generate (ignore my Enqueue button, thats an extra extension).


This will apply all those settings to your A1111 at once. If you get an error saying you cant find 4x-UltraSharp, please read that section of my first guide.

These will be mostly the default settings for the rest of the guide as well, if you want to try to follow along. I will not be pasting in the PNG for each image, but will give the Seed when relevant.

Click Generate, and you should see Stable Diffusion drawing the image in near real time.
(I cant have gif's in this article so please click the link below. Links to Civitai, and opens in new tab).

>>> Click For GIF <<<

How cool is that!? We can now watch how Stable Diffusion actually generates our images!

A couple things to note about the GIF. First, the image composition and main attributes are generated by the Sampler. Then, there is a clear point in the middle of the GIF where the image suddenly turns sharp. This is where the Upscaler takes over. As you can see in the GIF, most of the detail from the Upscaler happens during the first 5-7 steps, so very high steps on the Upscaler generally isnt that worth it, unless your image has a lot of detail.

Here is the resulting image for reference:

Full Res


Insight: Watch how Stable Diffusion generates your image. Pay attention to the first few frames of generation. They reveal how the sampler is attempting to draw your prompt initially. This is extremely useful because if SD shows you something you didn't expect, you can add it to your negative prompt to help guide the generation even further.

An example of this could be, you prompt that it's raining, but you always get an umbrella, so you add umbrella to the negative. Then all of a sudden your subject is wearing sometimes wearing a hat, even if you didn't prompt one. Well, if you put "umbrella" in the negative prompt, and the first few frames feature an umbrella, then the sampler has to blend the umbrella out of existence so it turns it into a hat. Sometimes the sampler is successful and just removes the umbrella entirely. Other times you get the hat.

Samplers Revisited

Lets revisit samplers from my beginner guide. There are three main types of samplers:

  1. Ancestral

  2. Evaluating

  3. Weirdos

If how samplers work really interest you, I recommend this guide. However, I feel the best way to learn about samplers is to simply just pick one and generate an image with it. Watch how they generate images. For example, ancestral samplers like Euler a and DPM++ 2S a Karras are both of the same type, they generate images differently.

Insight: Ancestral samplers tend to be more "creative" strictly with respect to their weight sampling in the first few images. Evaluating samplers are more "linear", meaning they seem to know how they want to generate an image from the start. Then there are the weird samplers that just do goofy stuff that doesn't make sense like UniPC. None of these samplers are better than the others, so find your favorites. I personally prefer the DPM++ Samplers though as I find the images to be "fuller". Best advice is to simply watch how each one generates images.

Stable Diffusion Critical Insights

Watching Stable Diffusion generate tens of thousands of images has given me two critical insights about how Stable Diffusion works.

A model cannot draw something its never seen

This is hopefully self explanatory, and most obvious so I wont spend too much time here. This is so critical though, that its worth noting.

A model cannot generate an image of something it has never seen before.

For example, if you type a prompt that says "woman, jacket, hat", every single model can generate an image with that prompt. However, type your name into a prompt, and no matter how hard you try to describe your likeness to the stable diffusion model, it will not generate you, because the model has no training dataset weights that it can correlate with your name.

Insight: Its best to use as precise prompt language as possible to ensure that your prompt has the best chance to understand what you want in your image. Dont say "Fish tank" when you mean "aquarium", unless you want a image of a fish with a M1 Abrams cannon. Hmmm, on second thought...that might be a pretty good image.

Stable diffusion will try to blend things that are in the same logical space

This one is harder to understand, but its more important. The best way I can illustrate this is by an example.

For this section we will be using the wonderful model: Colorful_v3.1.

Lets say you want a scene like this one:

This is a woman, in a sheer dress, sitting on a chair, in a luxurious room, in some kind of underwater habitat, with a window that is viewing the outside ocean.

Ok so lets write the rough prompt:

Prompt: a woman, sheer dress, sitting on a chair, luxurious room, underwater habitat, ocean floor, windows, (masterpiece:1.4), (high quality:1.4)

Negative: (low quality:2)

Wait, why is there no room? Why is it just a underwater chair? We got the woman, underwater, the chair, and the sheer dress right. So what about the room, habitat and windows? Why doesn't this work? Lets adjust the prompt, and try to tell the model that we want a room with windows.

Prompt: a woman, sheer dress, sitting on a chair, (luxurious room:1.2), (underwater habitat:1.2), ocean floor, (windows:1.2), (masterpiece:1.4), (high quality:1.4)

Negative: (low quality:2)

This is still not working. We clearly have a room with windows now, but the water is inside the room. We want the water outside. Lets telling the model that the scene is an underwater habitat. Lets also tell the model that we dont want the subject to be wet.

Prompt: a dry woman, sheer dress, sitting on a chair, (luxurious room:1.2), (underwater habitat:1.4), ocean floor, (windows:1.3), (masterpiece:1.4), (high quality:1.4)

Negative: (wet:1.3), (low quality:2)

We will never get anywhere along these lines. The reason is simple. The key problem is the term "underwater", as it's in the "top-level" logical space. Underwater affects everything in the prompt. Its in the same logical space as everything you could put in the prompt, so the model tries to blend underwater into everything.

Note: When I say "top-level, logical space", these are terms I made up. These are not some Stable Diffusion Terms.

Insight: Make sure you are thinking about the terms in your prompts at the logical level they are, taking care not to misalign them. We'll take a look at how to do this successfully in the next section.

Draw what you see

There is a well known saying given to students who are studying art that goes something like:

Draw what you see, not what you think you see.

The same is true for Stable Diffusion Models. In the above example, we think we see we are underwater, and logically, that is true. But the model doesn't know anything about our logic. It only knows images.

Lets therefore lets think about what we actually see. Well, we see a room, with a glass window with water and fish on the other side.

What words can we use to describe a glass window with water and fish on the other side? More importantly, the words have to be at the same scope as the room attributes, and furnishings.

What about an aquarium?

Lets try it, keeping in mind that the model wants to blend things together that are in the same logical space. So in this case, the aquarium and the room windows are probably similar enough that the model can blend them without much issue.

Prompt: a woman, sheer dress, sitting on a chair, luxurious room, aquarium windows, (masterpiece:1.4), (high quality:1.4)

Negative: (low quality:2)

It fits perfectly. Instantly, we are almost there. The model seems to understand exactly what we want now. Lets refine it a little further. Let's add "ocean" to give the window some more depth (get it, get it? depth lol)

Prompt: a woman, sheer dress, sitting on a chair, luxurious room, aquarium windows, ocean, (masterpiece:1.4), (high quality:1.4)

Negative: city, (low quality:2)

And literally a couple generation attempts later...

Final Image (Full Res)

Now that's more like it. We could keep refining this prompt to squeeze more quality out of it, but that is outside the scope of this guide. For now, we just want to make sure we are choosing our words as carefully as we can when it comes to constructing good quality prompts.

Insight: Think about how the model wants to generate images. Think about what the model is trying to blend together. Setup the model to blend attributes that are in the same "logical layer". Do this by thinking clearly about what the image in your head actually is visually.

Understanding term position and weights

For this section, we are going to use the wonderful StingerMix.

Lets say we have this prompt:

Prompt: (extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), 1girl, (succubus:1.4), (horns:1.3), (medieval dungeon:1.3), (stone walls:1.2), (dank and damp:1.2), (torchlight:1.1), (rusty bars:1.2), (chains:1.1), (narrow passages:1.2), (shadowy corners:1.2), (octane render:1.3), (masterpiece:1.3), (hires, high resolution:1.3), (:1.3), subsurface scattering, realistic, heavy shadow, ultra realistic, high resolution

Negative: (text:2)(medium shot:1.2), (low quality:2), (normal quality:2), (lowres, low resolution:2), BadDream, (UnrealisticDream:1.25)

... and it produces this image (Seed is 2888135920).


This is a fine image, but lets say we want her eyes to be yellow? Ok, so lets add "yellow eyes".

yellow eyes, (extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), 1girl, (succubus:1.4), ...

Well, her eyes are certainly yellow now, but so is her outfit, which is undesired. Lets adjust the weight. Lets add (yellow eyes:0.8).

(yellow eyes:0.8), (extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), 1girl, (succubus:1.4), ...

The image is slightly different, but her outfit is still yellow. Lets reduce the weight to 0.1.

(yellow eyes:0.1), (extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), 1girl, (succubus:1.4), ...

Wait, why is her outfit still yellow? Why isnt the weight seemingly doing anything?

This is because items can be positionally dependent. When Stable Diffusion "digests" your prompt, it does so in chunks. Sometimes, certain things get bumped out of chunks to make room for other terms. However, the very first thing in each chunk will always be processed. This is why you can set the weight to 0.1, and still have it effect the image.

The above paragraph is an oversimplification, but it gets the point across.


Lets try to move the (yellow eyes:0.1) to just after 1girl.

(extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), 1girl, (yellow eyes:0.1), (succubus:1.4), ...

If you look very closely you can see her eyes have the slightest twitch of yellow in them. This isnt what we wanted. We just moved yellow eyes, so its back to increasing the weight again.

(extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), 1girl, (yellow eyes:0.5), (succubus:1.4), ...

Well, things are currently trending in the right direction, lets try the default weight of none.

(extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), 1girl, yellow eyes, (succubus:1.4), ...

Excellent, this is exactly what we wanted! Now, that we have her eyes as yellow, lets change her outfit to red.

(extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), 1girl, yellow eyes, red outfit, (succubus:1.4), ...

..and we are right back to where we started.

Its tempting to think "well, ill just increase the weight of the red outfit then!", but this would be a mistake. If you go down this road, you will find that you are adjusting weights endlessly trying to refine your prompt, without realizing that position also matters.

Rather than adjust the weight of "red outfit", lets move it to just before "1girl".

(extreme close up:1.1), (ecu:1.4), (detailed face:1.4), (bokeh:1.3), (depth of field:1.3), red outfit, 1girl, yellow eyes, (succubus:1.4), ...

Excellent! Full Res

Insight: The position of a term in your prompt carries significant "weight" on its own. This positional weight is sometimes enough to overpower other weights in your prompt. If you find you are having a difficult time trying to get Stable Diffusion to respect your weights, try shifting terms around instead. Therefore you should...

Focus on Composition first

Focus on Composition first, then add details. This builds off of the previous section, and wont have any images to go with it.

Put the most important things into your prompt first. Then, once you have found that you have everything that is critical to your image, start adding refining details. If you do craft your images in this incremental fashion, you wont run into problems like some random term that you don't expect, is anchoring the whole image, such as the yellow eyes term in the previous section. In the previous section, it was obvious what the offending term was, but it wont always be so obvious.

Suggestion: Turn off hi-res fix when you are trying to refine your composition. Hi-res fix wont "fix" your image, unless you're using really high base resolutions. Keep your resolution to 512x768 and just slam out images looking for that perfect composition.

Insight: Build your prompts with composition in mind first. Once the model starts giving you the broad strokes of what you want, then add details to your prompt. Otherwise, you will be trying to de-conflict details and composition at the same time.

Learn some basic photography terms

Many stable diffusion models are chock full of photograph data. A great many of these photographs are labeled with formal photography terms that describe the photo. Therefore, learning a base amount of photography terms is a very good idea to instantly gain control over your prompts and images.

For this section we are going to use the excellent model 3dAnimationDiffusion from Lykon.

The prompt for all of these images will be:

Prompt: a woman, redhead, hazel eyes, <photography term>

Negative: FastNegativeV2, BadDream, UnrealisticDream

IMPORTANT NOTE: If you put something in the prompt, the model will try to generate it. For example, if you say "close up face shot", and then put "shoes", the model will struggle, and wont know how to blend those together, and will likely ignore your close up term. Likewise, if you go into detail describing the subjects clothing, dont be surprised when the model refuses to obey your photography terms, and only does a full shot, because it has all this prompt data telling the model that it needs to draw this specific outfit.

Since the model is trying to generate all the things in your prompt, if your photography term is the easist to ignore, well then guess what happens...it gets ignored.

Extreme close up - Captures the face, first and foremost.

Close Up - Face is still the primary focus, but its not as close up.

Medium shot - Typically waist or chest and up. Your standard portrait shot.

Cowboy shot - Used to show cowboys in movies, where their pistol was on their hip, but medium shot was too close up to capture the pistol. Typically from the knees and up. Gotta be a little careful with this one, as if you put (cowboy shot:1.7) in your prompt, you're going to get a 1880's ranch hand on a horse, driving cattle. Best to put "cowboy" in the negative prompt. Also known as "American shot" or sometimes "3/4 shot". Cowboy shot is mostly found in Booru tags (anime models) and therefore might be heavily model dependent.

Full shot / Full Frame / Full Body - All mean the same thing. You want the subjects full body.

Wide shot - Used with Landscape dimensions (width is longer than height). Its basically a cowboy shot but with more surroundings.

Dutch Angle / Dutch Tilt - This is the "camera tilt" effect. Can be used with either portrait or landscape, but its more pronounced in landscape.

High Angle Shot - This is the top down view.

Low Angle Shot - I'll give you one guess what this is. Its the angle from the floor, looking up. I find that stable diffusion often has trouble with this shot. As such, I dont have a good image for it. So we'll skip it.

Bokeh - This is the aesthetic blur you sometimes see, especially with night time photography. The goal is to blur everything so much that only lights are visible as circles in the distance.

Depth of Field - Similar to Bokeh, but its not nearly as pronounced. The background is not in focus, but its not completely unseeable. This was generated with "cowboy shot".

Insight: Having a basic knowledge of photography terms can instantly give you more control over your images. However, in my person experience, Stable Diffusion sometimes needs to be coerced into respecting your photography terms in your prompts, so the weight might have to be a bit higher than you expect. If I had to guess, its because the photography term is the easiest for the model to ignore if there are too many conflicting things in the prompt, but that's my speculation. The shots covered above are only just scratching the surface. If you want to know more, I suggest this blog post from StudioBinder.

Use Textual Inversions (Embeddings)

Textual Inversions (TI) are basically "super terms" that can be put into your prompts. The great thing about TI's is that they can be shifted around easily, and you can apply weights to them. That last part is huge, and cannot be overstated, so ill repeat it.

You can apply weight values to your textual inversions.

For example, one of the most popular TI's is bad-hands-5. Its normally pretty good, but sometimes it just doesn't quite deliver. Well, you can just crank that baby up like (bad-hands-5:1.5) and all of a sudden you will notice that your images hands are better.

Now I know what you're thinking. "Ok so ill just crank up the weights on these TI's all day and ill generate the best images of my life". Not to fast. Cranking up the weights on TI's does come with a cost. It can clamp down on image creativity. If you ratchet up FastNegativeV2 to (FastNegativeV2:2), you will feel the AI start to struggle as you have put it into a tiny little box where everything has to be exact, and that's not how Stable Diffusion generates its best images. But what if you still wanted FastNegativeV2, but you didnt want it to clamp down so harshly? Well, you can reduce its weight, like (FastNegativeV2:0.8).

Lets relate TI's to roadside guard rails. Guard rails are great, but not if they force you to drive in a strictly straight line. You want just enough guard rails so that you don't go flying off the road, but you can still go where you want.

Here is a list of some excellent TI's you should consider using:

FastNegativeV2 - All purpose negative for all types of content. Jack of all trades. Master of None. Good enough for almost everything.

BadDream / UnrealisticDream - Instant quality increases. BadDream is for general quality. UnrealisticDream is more for photo-realism. The quality increase is more subtle than FastNegativeV2. The details are increased.

fcPortrait Suite - The easiest way to get portraits bar none. These TI's criminally underrated. Also includes fcNegative which is another great negative with a lighter touch than FastNegativeV2 (this is subjective though).

CyberRealistic Negative - A great negative geared towards photo-realism.

bad-hands-5 - Pretty self explanatory. Focuses on making hands look normal.

Insight: Textual Inversions, sometimes called Embeddings, are a very easy way to increase image quality. They offer a wide degree of control especially considering their weight can be adjusted.

Use LoRAs

If an image is worth a thousand words, then a LoRA is worth a thousand prompts. LoRA's are so powerful that there are countless articles written about them. I'm not going to cover those. Instead I'm going to focus on a couple LoRA's that I find most useful.

Lets use the epic2.5D model from epicRealism, and lets write a prompt for a image we can test LoRAs with.

Prompt: (cowboy shot:1.4), (3/4 shot:1.4), 1girl, skimpy, revealing, racy, (ancient priestess:1.3), (egyptian:1.2), (aztec:1.2), (mayan:1.2), (ornate headdress:1.1), (jewelry:1.2), (ceremonial robes:1.1), (gold:1.1), tattoos, (looking at viewer:1.4), standing, hands behind back, tropical rainforest, lush foliage, exotic plants, jungle, (octane render:1.3), (masterpiece:1.3), (hires, high resolution:1.3), subsurface scattering, realistic, heavy shadow, ultra realistic, high resolution

Negative: (cowboy:1.2), (medium shot:1.2), (nude, naked, nsfw, nipples, vagina, pussy, topless, bare breasts:1.5), (modern:1.5), (casual:1.4), (minimalist:1.3), (jeans:1.3), (t-shirt:1.3), (low quality:2), (normal quality:2), (lowres, low resolution:2), BadDream, UnrealisticDream, (bad-hands-5:1.25), (FastNegativeV2:0.75)

Seed: 3304805436

We'll use this image as a base.

Detail Tweaker - An absurdly good LoRA. If you aren't using this, you are generating images on hard mode. Adds or removes little details from your images. If I could only choose one LoRA, it would be this one.

Here is the base image with the detail LoRA at 0.6 weight.

<lora:more_details:0.6>

As you can see, it add more details but also changed her robe color. However we didn't specify a robe color in our original prompt, so its not the LoRAs fault.

LowRA - Stable Diffusion really likes bright images. Sometimes you don't want bright images. In those cases, you use LowRA. Here is the base image with LowRA set to 0.7.

<lora:LowRA:0.7>

If we compare this with our first image, its clearly darker.

Now, here is both of them combined.

<lora:LowRA:0.7> <lora:more_details:1>

As we can see, simply applying LoRAs can bring useful properties to our images, without any effort trying to describe to the model what you are looking for. But consider what you would type into your prompts if you wanted to achieve these effects? You can use "darker" and "ultradetailed", etc, but those terms will get cumbersome quickly, and they may not even have the desired effect.

Insight: If you're not using LoRA's, you should. They are often the only way to achieve certain effects. They are extremely powerful when used correctly, and can instantly give you the tools you need to craft your image.

Use ADetailer

You might have seen ADetailer being mentioned here and there, and wondered what is was. ADetailer is "Restore faces" on steroids. Its a collections of smaller AI's that perform post processing effects on your images, after Hires Fix. Why is this useful? It fixes faces. It fixes eyes. It fixes hands. In other words, it fixes everything Stable Diffusion normally has trouble with.

It uses simple inpainting to do this. You can even specify different prompts for each fix.

As an example, you can tell the ADetailer face scan, to color the eyes a certain color, and then you don't even need to put the eye color in your main prompt, which instantly reduces term collision.

These last two sections will use my own model, RestlessExistence. Yes, I know its a shameless plug, but I waited until the very end, so please don't hate me.

Anyway, lets generate base image, like before, using all the techniques we have discussed.

Prompt: (cowboy shot:1.4), (3/4 shot:1.4), 1girl, brunette hair, (shoulder length hair:1.2), (short black cocktail dress:1.3), simple pendant necklace, strapless, bare shoulders, makeup, mascara, rosy cheeks, smokey eyes, lipstick, (beautiful, gorgeous:1.1), content, relaxed, soft smile, (looking at viewer:1.4), standing, luxurious night club, dim mood lighting, accent lighting, high end, upscale, posh, fancy ornate bar, sultry, seductive, risque, leaning forward, colorful, (octane render:1.3), (masterpiece:1.3), (hires, high resolution:1.3), subsurface scattering, realistic, heavy shadow, ultra realistic, high resolution, (fcHeatPortrait:0.8), <lora:more_details:0.5>

Negative: (ugly, ugly face, average face, imperfect complexion:1.3), nude, naked, nsfw, nipples, vagina, pussy, topless, bare breasts, (low quality:2), (normal quality:2), (lowres, low resolution:2), (FastNegativeV2:0.75), BadDream, (UnrealisticDream:1.25), (bad-hands-5:1.25), (cowboy:1.2), (medium shot:1.2)

Seed: 612507832

I sure do enjoy my model lol...

Once you install ADetailer, you will see it as an extra panel in your settings. The yellow arrow points to the ADetailer model want to scan with, in this case face_yolov8n.pt. This tells ADetailer to scan for faces, and the apply the Positive Prompt (green arrow) and Negative Prompt (red arrow), via inpainting.

Lets change her eyes to blue using ADetailer.

Here is the result, after generating the image with ADetailer, with these prompts:

Prompt: feminine face, blue eyes

Negative: BadDream, UnrealisticDream

Remember, there is no mention of blue eyes anywhere in our original prompt, and the original seed image did not have blue eyes. Those blue eyes came purely from ADetailer. But also look at the two faces. ADetailer also touched up the face in the second image as well, as now it looks a little more airbrushed.

Now, I actually think the eyes are a little TOO blue, so lets try to make them more realistic. Lets adjust weight to make them slightly less blue.

Prompt: feminine face, (blue eyes:0.5)

Negative: BadDream, UnrealisticDream

Much better, but I liked the original face if I am honest. Lets change the model from face_yolov8n to mediapipe_face_mesh_eyes_only. This ADetailer model will only scan for eyes, and touch up those, leaving face details intact. Lets try it. But as this isnt the face model anymore, we need to edit our prompt.

Prompt: blue eyes

Negative: BadDream, UnrealisticDream

Look at that, flawless blue eyes, and it was so easy to change because I only touched ADetailer's settings.

If you look at the the first ADetailer image I posted, you will notice there is a "2nd", so we can run another ADetailer model.

Just like with the face, and eyes, lets specify a positive and negative prompt, but this time we are going to select the hand_yolov8n.pt model. Lets also turn up the detection model confidence threshold, because this model will evaluate a lot of stuff as hands. Also, notice the (bad-hands-5:1.5). Because we are inpainting only the hands, we can really crank up the power on this TI.

Now lets generate the image again.

Its hard to tell with these compressed images how much better the ADetailer is. So here are the final two versions, and their full res. I suggest you open both in new tabs and flip between the two images, looking at the eyes and hands.

Base Version (Full Res)

ADetailer (Full Res)

Insight: If you ever see an image with perfect hands and eyes, it was probably made with ADetailer. It makes life so much easier with respect to refining the things that stable diffusion is known to have trouble with (faces, eyes, hands).

Seed + Extra Seed

Now, I have a confession. I made RestlessExistence solely for its ability to generate great Playboy Bunny Costumes (that's not true, but it does generate a great bunny girl). For example:

Prompt: (cowboy shot:1.4), (3/4 shot:1.4), 1girl, long hair, ponytail, blonde hair, (black one piece:1.2), (satin leotard costume:1.2), (traditional playboy bunny suit:1.3), (seamless full black pantyhose:1.6), (bunny ears:1.3), (detached collar:1.2), (bow tie:1.2), white cuffs, (french satin wrist cuffs:1.4), (bare shoulders:1.3), (strapless:1.3), (bunny tail:1.6), (high heels with ankle strap:1.1), makeup, mascara, rosy cheeks, smokey eyes, lipstick, (beautiful, gorgeous:1.1), flirty, smirk, tease, playful expression, (looking at viewer:1.4), sitting, leaning back, reclined, (legs crossed:1.3), hands behind head, detailed background, balcony, modern city skyline, skyscrapers, glass buildings, bustling city, (night, midnight, night sky:1.25), cinematic composition, colorful, (octane render:1.3), (masterpiece:1.3), (hires, high resolution:1.3), subsurface scattering, realistic, heavy shadow, ultra realistic, high resolution, (fcDetailPortrait:0.75), <lora:more_details:0.5>

Negative: (cowboy:1.2), (medium shot:1.2), (seams:1.5), (garters:1.4), (thigh high stockings:1.7), (thighhighs:1.4), (thighband:1.4), (stay-ups:1.4), (hold-ups:1.4), (stockings with thigh bands:1.4), (intricate leotard:1.1), (latex:1.3), (opera gloves:1.5), (detached sleeves:1.5), (bardot:1.4), (gloves:1.4), lingerie, zipper front, (ugly, ugly face, average face, imperfect complexion:1.3), (pussy:1.4), (simple background:1.4), (low quality:2), (normal quality:2), (lowres, low resolution:2), (FastNegativeV2:0.75), BadDream, (UnrealisticDream:1.25), (bad-hands-5:1.25)

Seed: 64572525

This is almost perfect, with the exceptions that the wrist cuffs, and collar are black, and not white. So lets use the Extra Seed function in A1111 and see if we cant squeeze out some white cuffs, and white collar, while keeping the composition (and prompt) just the way it is.

Now, what this will do is generate tiny changes on top of your original seed. Its basically a whole denoiser that sits on top of the original image, and you get to determine how powerful it is, and you control what seed the denoiser uses. What we are doing here, is telling the extra seed function to generate a 5% change on top of the original image. Hopefully this is enough to get some white cuffs, if we just let it run.

Whenever I do this, I always turn off hires fix, and ADetailer, because I am looking for the seed to give me something specific. If I get the white cuffs, I can reuse the extra seed, and then turn on hires fix, and ADetailer.

But first, lets let it work.

...and after 23 attempts, I got one with a white collar. Remember this image has no hires fix or ADetailer, so the quality is lacking, however, you can see that its still very similar to the base image.

Lets reuse the variance seed (1039736832) that gave me this white collar, and reduce the strength to something like 0.02, so we retain as much of the base image as possible. Im also going to turn on hires fix, and ADetailer.

its certainly very good, but the eyes and hands arent quite perfect. Lets play around with the variance strength a bit and see if we cant get some better hands, and eyes.

Lets try a variance strength of 0.04.

(Full Res)

I'm going to go with this image. If I want to touch this image up even more, I'll have to resort to either more prompt refinement, rolling through more seeds, straight up inpainting, or other more advanced methods.

Insight: Using the Seed + Extra Seed can sometimes help you round off rough edges with the images you generate. If you generate an image that's almost perfect, try the extra seed function, and see if you cant get the model to generate an image that's slightly different enough, so that it retains its original luster, but removes the unwanted rough edge.

In Closing

I want to say thank you, if you read this whole thing. I worked hard on this guide. This is what I wish existed when I started out on my stable diffusion journey. These are the things that I had to learn the hard way through bitter trial and error, so hopefully they can help you. I am open to any constructive criticism you have, as I want this guide to be the best that it can. Seriously if something doesn't make sense, please let me know, and Ill do my best to fix it.

Now go forth and create quality lewds like the Stable Diffusion Gods intended.


----

Updates:

2024-MAY-23: Updated URL for Lowra.

2024-JUNE-16: Updated Making Images Great Again

1.6k

Comments