Need help understanding

Hi all. I love the art that is being produced by AI and that is posted on this site and everywhere else for that matter. I built me a new PC and it has the ability run Stable Diffusion on it, including the 12 GB graphic card. I've been trying and trying to produce art, but every time I do (I do a batch of 4), there is something "extra" in the image. Whether it is a 3rd leg, an extra hand, leg cut off at the knee, or multiple bodies meshed together taking up the same space, I just can't seem to get any good images out that are satisfying to me. I've even tried taking the settings from pictures on here and putting them in and I just don't get the image right. Is there any kind of tutorial, or really a series of tutorials that I can go through that will help me better understand SD and all the parts that come with it?

4 Answers

You're probably not doing anything wrong.

Unfortunately, AI isn't perfect, and errors like this are very common—the perfect images you see posted online are a very small and heavily curated selection, handpicked from the cesspit of mechanical mistakes. From personal experience, at least 90% of the images I generate via Stable Diffusion wind up in the Recycle Bin.

If you really want a guide on image generation, I would suggest browsing through some of the articles other users have posted to civitai on the topic.

As for me I get better results after spamming negative prompts with things I don't like

You can put this in negative prompt and check the result

EasyNegative, (((White))) (((bad anatomy))), (liquid body), (liquid tongue), (((disfigured))), ((((malformed)))), (((mutated))) ((anatomical nonsense)), text font ui, error, ((malformed hands)), (long neck), ((blurred)), (((lowers))), ((lowres)), ((bad anatomy)), (bad proportions), (bad shadows), (uncoordinated body), ((unnatural body)), ((fused breasts)), bad breasts, poorly drawn breasts, extra breasts, liquid breasts, missing breasts, fused ears, bad ears, poorly drawn ears, extra ears, liquid ears, heavy ears, missing ears, fused animal ears, bad animal ears, poorly drawn animal ears, extra animal ears, liquid animal ears, heavy animal ears, missing animal ears, text, ui error, missing fingers, missing limb, fused fingers, one hand with more than 5 fingers, one hand with less than 5 fingers, watermark, username, blurry, jpeg artifact, signature, bad hair, poorly drawn hair, fused hair, liquid hair, grease hair, ugly hair, big muscles, ugly, bad faces, fused face, poorly drawn face, cloned face, big face, long face, bad eyes, poorly drawn eyes, fused eyes, extra eyes, ((bad mouth)), poorly drawn, mouth fused, mouth, quality low, quality normal, QR code, bar code. (((duplicate person))), bad teeth, poorly drawn teeth, fused teeth, teeth, quality low, out-of-frame, far_from_viewer, spread legs, low quality legs, bad quality legs, fused body parts, non-realistic, fake looking image, low quality hands, ugly hands, low quality fingers ugly fingers, (floating object), fused legs dismembered limbs mutation, mutated legs, mutated knees missing limbs, low detailed background out-of-frame, low detailed hair, weird background, pregnant, multiple_navel, 3 arms, 3 legs, 3 hands, extra limbs, floating object, floating limbs, disembodied limb, extra legs, shiny cloth, reflective cloth, 

This is apparently something where my experience is miles different from many people. It isn't complex to get ok to good looking outputs and you don't need to generate a lot of images to get there.

You mentioned copying prompts from civit. Firstly, I would suggest that you check that all your setting are correct, you have the same checkpoint and other models used when copying prompts. That way you will get the exact same image in theory. The problem is getting the right settings that aren't visible in the image descriptions. Image dimensions/aspect ratio have a huge impact on the outcome. Also, people almost always use hi-res fix which is a way of upscaling and adding details and can change the result just a tiny bit or make it look almost entirely different..

Image posted by StickyRicky (

If I remember correctly, for this one I used a ratio of 512/712 and kept the variance in the hi-res fix pretty low. So I suggest trying to replicate this one without hi-res and check whether it turns out similar -> all your settings are correct.

Now for creating good images in general:

Unless the content you want to generate is complex (multiple people, people holding things, spread hands) you can almost always create something that works within 4 images with minimal negative prompts. You do not need additional models or embeddings. Rather find the output that is the closest to what you imagined/what you like most and start fiddling with the settings. Changing the number of steps in particular works really well for finding the best variant for the seed.

Keep the prompts simple and only ramp up if the short ones work.

Note that every model is different and some of the most popular (like Abyss Orange Mix) can absolutely suck at hands, or are more willing to create fused limbs. When this happens more often than not adding specific prompts to prevent that from happening do nothing. Honestly, I'm not sure why I still have the "bad hands" prompt in the negatives.

I'd be interested to see your results/what checkpoint you use and what you are trying to generate. There might be some more useful tips I can give on a case by case basis.

So, taking what you have and your settings, this is what I got. It looks good, but seems when I have multiple people in my images, like man and woman or two or more females, things tend to get replaced or something extra added on or even an appendage is just sitting alone. Oh, I also don't have that LORA that is in your settings. I know nothing about LORAs and am seeing it in many of the published photos on here.

