Sign In

Just Another Image Generation Guide for Beginners (Pony Diffusion V6 XL)

Just Another Image Generation Guide for Beginners (Pony Diffusion V6 XL)

Disclaimer: I'm not an image generation expert, so feel free to discuss any mistakes in the comments.

  • Check out my new article about what is popular in Pony Diffusion: https://civitai.com/articles/8886. It will explain more concepts like CFG Scale, Sampler, Steps, and Loras!

I didn’t invent anything; there are more detailed tutorials available than this one. My goal was to write the kind of article I wish I had when I first started generating images - short and filled with visuals and links. So, here it is!

TL;DR:

The quickest way to create cool images for people, like me, without imagination:

1. Find an image on Civitai in the style you like

2. Click remix

3. Open website: https://danbooru.donmai.us/posts

4. Find an image you want to use as inspiration

5. Copy the tags on the left

6. Use those tags in your positive prompt, e.g. "score_9, score_8_up, score_7_up, score_6_up, [copied tags without numbers]"

This is how I was inspired to create my most popular image (It is NSFW, so I moderated it):

Original image: https://danbooru.donmai.us/posts/5886691

BOORU

What are booru tags and why should I care about their quantity? Or how you can understand what words you can use in the prompt.

One of the main questions I had when I first started to generate images - was what words I could put to the prompt and what it would understand.

For example, why can you generate the black mask, but not the green mask?

It may not be very obvious for beginners, but Pony Diffusion V6 XL was trained on images from booru imageboards.

You can just search for them on the internet, but here is what booru I use: https://danbooru.donmai.us/posts?tags=black_mask+&z=5

It has 6.3k images of a black mask, but only 203 images of the green mask.

My rule of thumb is that the tag should have at least 1k images to be recognized. If there are at least 3k, it's almost safe to use it.

Quick and long way to learn booru tags

There are cool articles on civitai that will help you quickly understand what booru tags you can use, e.g.:

https://civitai.com/articles/6349/280-pony-diffusion-xl-recognized-clothing-list-booru-tags-sfw

https://civitai.com/articles/7323/pony-realism-vs-danbooru-handsfingers-tags-most-non-sexual-related-wip

https://civitai.com/articles/5150/danbooru-tagging-visualization-for-ponyxl-autismmix

But your real friend is this list of tag groups:

https://danbooru.donmai.us/wiki_pages/tag_groups

You can use it to learn what words the model understands, from different clothes to poses and gestures.

I often use it for inspiration or just when I forget the correct wording.

Fast way to convert booru tags to civitai prompts

The full story is in this article:

https://civitai.com/articles/2113/regex-for-quick-conversion-of-booru-tags-to-sd-prompts

The regex didn't work for me (Firefox), so I used ChatGPT to create a new one.

Short story:

1. Copy tags

2. Open URL: https://regex101.com/r/zLcDno/1

3. Paste tags to TEST STRING

4. Copy parsed tags from below

Prompt Engineering Tips

With longer prompts you lose control over details

I've heard opinions that the longer the prompt the better control. But in reality, it's the opposite. The more words, the bigger the chance that it will ignore one of the words.

It's up to you to decide which image you like more, but you can't argue that on the second image, there are no horns.

Models attention is on the first words

The last words influence the image much less than the first.

This is how it works:

I just moved "blue skin, blue oni, facial tattoo" to the top, and I was able to return horns!

More control over priority

Of course, thinking of the word order is not fun, so an easier way to influence the priority is by using special symbols:

(blue oni) - gives +10% priority to "blue oni"

(blue skin:1.2) - gives +20% priority to "blue skin"

But think of this method as a quick hack, not as a main method.

Did you notice the disappearance of facial tattoos? You can play it for a long time. This is why I prefer to apply those methods in this order:

1. Try to make prompt shorter

2. Move important words up

3. Use priority hack

Negative Prompts

Not to improve the quality

Pony Diffusion V6 XL's creator explicitly states this: "The model is designed to not need negative prompts in most cases."

This is disappointing, but asking not to generate six fingers won't work:

Change the style

Negative Prompts still can be used to define your style, so you don't need to extend your positive prompt:

Hide something

You can still use it to finetune an image a little bit by deleting objects you don't want to see:

Negative prompt I use

score_6, score_5, score_4, pony, furry, monochrome, curvy, fat, pubic hair, watermark, 
artist name, ugly, ugly face, mutated hands, low res, bad anatomy, bad eyes, blurry face, unfinished, sketch, greyscale, (deformed), 

Funny, but I don't follow my own recommendations myself. The reason is - each time I try to delete something from it, I don't like it. I don't like them because of the style, the image still has mutated hands and watermarks

149

Comments