Sign In

Making Images Great Again! REMASTERED


START ➲ GENERATION GUIDE

https://civitai.com/models/926443/ntr-mix-or-illustrious-xl-or-noob-xl <------------------

Model's like Pony will be added in the future. NTR Mix is very beginner friendly, In the sense that creating high quality and consistent images is extremely easy which is why I'm using it.



Subject | Composition/Scene | Quality

Composition/Scene | Quality

Refer to Prompt Organization for more info.


The closer a prompt is to the beginning, the higher its priority will be.


Want to force a weight change?

( Thing1: WEIGHT )

the weight/emphasis of that token or prompt will be altered by 0.1 in either direction.

You can also just input your own value as well manually since the format can be typed.

This feature is possible on on-site generator


Refer to the section: Prompting Extras for more info


Common Tags:

  • Subject Descriptors:

    • "1girl": Specifies a single female subject.

    • "portrait": Focuses on a person's face or upper body.

    • "landscape": Depicts natural scenery.

  • Artistic Styles:

    • "digital art": Generates an image in a digital art style.

    • "oil painting": Emulates a traditional oil painting appearance.

    • "watercolor": Produces a watercolor painting effect.

  • Quality Enhancers:

    • "highly detailed": Encourages intricate details in the image.

    • "4k": Suggests a high-resolution output.

    • "HDR": Implies high dynamic range for vivid colors and contrasts.

  • Lighting and Environment:

    • "cinematic lighting": Promotes dramatic and film-like lighting.

    • "studio lighting": Indicates controlled, professional lighting conditions.

    • "bokeh": Adds a blurred background effect, enhancing depth of field.

  • Art Platforms and Trends:

    • "trending on ArtStation": Aims for styles popular on the ArtStation platform.

    • "deviantart": Targets styles common on DeviantArt.

  • Camera Perspectives:

    • "close-up": Specifies a near view of the subject.

    • "wide shot": Requests a broad view, capturing more of the scene.


https://a13jm.github.io/MakingImagesGreatAgain_Library/

Fold all and move to top: Shift + F

Search from 201,018 tags and 167 libraries!



Basic Settings:


Let's set our Sampling Method to DPM++ 2M. This SM tends to give the best results with this model.

And let's also change our Schedule type to Karras for the best results, Beta works as well.

Let's also set our Sampling Steps to 30, this will ensure a quick but accurate enough generation.

Since this model is an SDXL model here are my recommended aspect ratios for the best results:

(The pixel count is approximately 1,048,576, with the following recommended resolutions for common aspect ratios.)

1024 x 1024 - 1:1

1254 x 836 - 3:2

1365 × 768 - 16:9

Want to use any ratio without worrying about having to calculate the amount of pixels?

Check out my Aspect Ratio Converter:

https://a13jm.github.io/MakingImagesGreatAgain_Ratio/

And you can keep the CFG Scale to 6 for now, this just determines the models "creativity". the higher the number the more it follows your prompt but it will also distort your image more.


Generating an Image ➲

First, let's start off with our idea. But how do we get there? Maybe it changes as you go along, but to start, you need a base to work off of.

I like to start with my quality tags and negative prompt and build off that.

✦ masterpiece, high quality, absurdres, best quality, very aesthetic, newest,

⊘ worst quality, ugly, low quality, bad anatomy, bad hands, ugly face

Next we'll add in our subject.

You may notice that the way I prompt doesn't align with the Basic Prompting's organization.

I've prompted using Quality | Composition/Scene | Subject for a long time and found that it works best for what I want to achieve. I would highly suggest you experiment with different variations of Prompt Organization to find what best suits your needs. For more information on this subject, refer to Prompt Organization.

Icon Meanings:

✦ ---> Quality Section
✾ ---> Subject Section
⊘ ---> Negative Prompt
✏ ---> Added Prompts/Tokens
✿ ---> Environment Section

✦ masterpiece, high quality, absurdres, best quality, very aesthetic, newest,

✾ 1girl, Mai Sakurajima, ✏
⊘ worst quality, ugly, low quality, bad anatomy, bad hands, ugly face

Then hit GENERATE ➲



This isn't bad but I want the character to resemble her a little better. Let's add in some of her key features and what we want her to be doing.

✦ masterpiece, high quality, absurdres, best quality, very aesthetic, newest,

✾ 1girl, Mai Sakurajima,
│
└➝ blush, looking at viewer, large breasts, bunny outfit, hands on face, ✏
⊘ worst quality, ugly, low quality, bad anatomy, bad hands, ugly face

Notice how bunny outfit doesn't generate a bunny outfit as you'd expect as it does in the next image. That's because this specific seed generates a noise that resembles a sweater more than a bunny outfit, that's why the image above and below are similar, same seed. This means we either need a new seed or change the environment more. Since I'm running the same seed let's do that.



Alright, so let's add some scene details to widen the dataset and bring in more variety.

✦ masterpiece, high quality, absurdres, best quality, very aesthetic, newest,

✿  from below, backlighting, abstract, cinematic, colorful, (portrait:0.8), dutch angle, ✏

✾ 1girl, Mai Sakurajima, blush, looking at viewer, large breasts, bunny outfit, hands on face,
⊘ worst quality, ugly, low quality, bad anatomy, bad hands, ugly face

(portrait:0.75) "What do those parenthesis and that colon do? Whats that number mean?"

In the Stable Diffusion Web Ui, by highlighting text and pressing either CTRL + UP or CTRL + DOWN, you can increment the weight of the prompt by 0.1, or just enter the value manually.

( Thing1: Weight ) Weight ranges from 0-1

Refer to Weight Adjustment for more information.



There we go, that's what we were looking for!


This is a process of adding, generating, adding, and generating over and over, fine tuning the prompt to give the desired output.

The same goes for the negative prompt as well. See something you don't like? complain about it to the negative prompt!



This looks good, but we can make it even better! I introduce to you...



Let's send our image into Inpaint and work on it some more there.


Don't know how? Refer to the video below!



Once you're inside Inpaint, Here's the setting I'd recommend and why I recommend them.

Soft Inpainting = True

Mask Blur = 4

Inpaint Area = Only Masked

Denoising Strength = 0.45 - 0.6

Seed = -1

Aspect Ratio = Press: [📐]

Sampling Steps = 60

Sampling Method = DPM++ 2M

Scheduling Type = Karras

  1. Soft Inpainting. This will allow the masked area to blend seamlessly with the unmasked area.

  2. Mask Blur. This is the feather of the mask's edge, higher numbers mean a larger fade.

  3. Inpaint Area. Having it set to Only Masked means that only the area in the mask is altered.

  4. Denoising Strength. Ranging from 0-1, this determines how much the image is altered.

  5. Seed. Make sure this is set to -1 so you aren't repeating the same generation.

  6. Aspect Ratio. This is the size/shape of the Inpainting area.

  7. Samling Steps, This is how many steps it takes to diffuse the image. Higher numbers will generally make higher quality generation, up to a point.

  8. Sampling Method and Scheduling Type. I like to set this to a more slow and accurate sampler and scheduler, such as DPM++ 2M and Karras.


Let's mask the face and change our prompt. Remember, the more area masked, the lower the quality of the inpaint so only inpaint what you have to.



✦ masterpiece, high quality, absurdres, best quality, very aesthetic, newest,

✾ 1girl, Mai Sakurajima, blush, looking at viewer, ✏

⊘ worst quality, ugly, low quality, bad anatomy, bad hands, ugly face

It's always good practice to start from the bare minimum and build back up your positive and negative prompts each inpaint to ensure you get exactly what you want.

Your masked section is acting as a new canvas, your prompt is what will be generated in that area.


GENERATE ➲




You can repeat this process over and over as many times as you'd like anywhere in the image!

But wait, how do I get the image back onto the canvas? Simply hold click the generated image, and drag it back onto the canvas.




I would highly recommend you play around with the Denoising Strength depending on what you're changing.



It can be, but you can take it one step further! (you can also edit externally and then inpaint the image to blend it in)



Using Photopea you can enhance your images even more for free!


And voilà, you've done it, you have learned the basics to creating your first image!

This article will be receiving up-to-date information on how to create the best images possible, utilizing every aspect of Stable Diffusion.


I hope to see you here again! Happy Generating!


Extra Information:


Model: ntrMIXIllustriousXL_xiii

Batch 1: (Not in this order)

Subject:
1girl, looking at viewer, sweater, blue eyes, black hair,

Environment:
bedroom, soft lighting, depth of field,

Quality:
masterpiece, high quality, absurdres, best quality, very aesthetic, newest,
Batch 2: (Not in this order)

Subject:
1girl, looking at viewer, demon girl, red skin, horns, green eyes, bodysuit, black hair, long hair,

Environment:
city, backlighting, depth of field, from below,

Quality:
masterpiece, high quality, absurdres, best quality, very aesthetic, newest,
Batch 3: (Not in this order)

Subject:
1girl, laying on bed, looking at viewer, dark skin, freckles, brown hair, brown eyes, medium hair, orange crop-top, breasts,

Environment:
bedroom, soft lighting, depth of field,

Quality:
masterpiece, high quality, absurdres, best quality, very aesthetic, newest,


Conclusion from data regarding organization:

In most cases, whether you put quality or subject tokens first won't make a substantial difference. As long as the environment token isn't first, you likely won't notice much of a difference.

And in general, the position of the quality tokens does not significantly impact the overall quality of the image.


This is just a way to stay organized, it is not a rule. Anything closer to the start is prioritized more.

By this logic, the best way to position your tokens would be: Subject | Environment | Quality

(Conclusions are made off ntrMIXIllustriousXL_xiii)


How it works:

In the default Stable Diffusion and Forge Web Uis, prompts are limited to 75 tokens due to the CLIP text encoder's limitations. Any words beyond this limit will be ignored, meaning that only the first 75 tokens influence the image. However, web UI implementations like AUTOMATIC1111 and Forge Ui, allow chunking, where prompts exceeding the limit are split into segments of 75 tokens and processed sequentially. This allows for longer prompts, but the impact of later segments diminishes. To optimize prompts, prioritize more important details within the first 75 tokens and use tools like weighting (word:1.5) to emphasize important tokens.


Sources:

https://stable-diffusion-art.com/prompt-guide/

https://github.com/huggingface/diffusers/issues/2136


Syntax:

Source: https://aienthusiastic.com/stable-diffusion-prompt-grammar-syntax-weights/


Classic Weight Adjustment:

parenthesis: (a rainy day) ⇡ 10%

parenthesis: (((((a rainy day))))) ⇡ 50%

square brackets: [a rainy day] ⇣ 10%

square brackets: [[[[[a rainy day]]]]] ⇣ 50%


Not possible if using on-site generator.

Forms of Prompt Scheduling:

[Thing1: Thing2: Ratio]

Ratio controls at which step Thing1 is switched to Thing2. It is a number between 0 and 1.

This means that if you use a value of 0.5 while running with 30 steps, Thing1 will run from steps 1 till 15, and Thing2 will run from steps 15 till 30.

[Thing1:STEP]

This delays a prompt until the specified STEP, for example: [a cow:5] at step 5, the generation will introduce a new prompt: a cow.

[Thing1::STEP]

From steps 1 till STEP, Thing1 will be used to generate the image, example: [a cow::5] at step 5, a cow will no longer be used to generate the image.

Inputting a value between 0 and 1 will act as a percentage marker rather than a specific step.


Comparing the effects of Comma (,), Full Stop (.), Semicolon (;), Pipe (|), and BREAK.


Common prompt structure: (# = Variable)

Prompt1# Prompt2

Analyzing Visual Differences Across Labeled Image Batches Using the Sum of Absolute Differences (SAD) and Structural Similarity Index (SSIM) Metrics

Abstract:

The experiment analyzed visual differences across ten batches using SAD and SSIM metrics, visualized in heatmaps with interpolated gradients. The heatmap showed that "BREAK" caused the largest composition differences, while other labels were marginally similar.

The experiment examined visual differences between labeled images across ten batches using the Sum of Absolute Differences (SAD) and Structural Similarity Index (SSIM) metrics. Each batch comprised five labeled images: ",", ".", ";", "pipe", and "BREAK". The "," image served as the principal for calculating the SAD values of the other four images within each batch. SAD, a pixel-wise measure of dissimilarity, was determined by summing the absolute differences between corresponding pixel intensities of the principal and compared images. SSIM, a perceptual measure of similarity, was also determined by evaluating the structural information, luminance, and contrast between corresponding pixel regions of the principal and compared images.
The resulting data was visualized in a heatmap to facilitate comparative analysis both within and across batches and image labels. To improve readability, the SAD and SSIM values were interpolated to create smoother gradients, and a plasma colormap was applied to represent the intensity of differences. In the heatmap, the vertical axis represented the batches (B1 to B10), while the horizontal axis denoted the labeled images.
Key takeaways of the analysis included the identification of patterns in visual differences within and between batches, as well as the detection of potential outliers with significantly high SAD values or low SSIM values.

Higher values = less similar.

https://i.ibb.co/1Lfbvdq/SAD-Map.png


Lower values = less similar.

https://i.ibb.co/s94WJmm/SSIM-Map.png


SAD Table:

Deviation from Principal:

Average Value: 31,598,675

Full Stop Average: 27,744,665

Semicolon Average: 27,570,758

Pipe average: 30,207,754

BREAK Average: 40,871,523


SSIM Table:

Deviation from Principal:

Average Value: 0.608085

Full Stop Average: 0.62687

Semicolon Average: 0.63123

Pipe average: 0.62022

BREAK Average: 0.55402


Experiment Data:

https://www.mediafire.com/file/e518w2runacwo4y/Data.zip/file


Link to the GitHub repo for SSIM and SAD:

https://github.com/A13JM/SSIM_SADBased off the Heatmap, BREAK seems to create the biggest differences in image composition, whilst the rest are marginally similar.


As of right now, this is the only measurable statistic I can think of to compare syntaxes.

Since this is only a batch size of 10 = 40 images, the results can be better. In the future I plan on trying a batch size of 50 = 200 images.


Source: https://en.wikipedia.org/wiki/Sum_of_absolute_differences

Source: https://en.wikipedia.org/wiki/Structural_similarity_index_measure


Understand Tokenization:

https://platform.openai.com/tokenizer


Basics of AI:

https://www.theverge.com/24201441/ai-terminology-explained-humans


Special Thanks:

MarcianoTheVoyager

https://civitai.com/user/MarcianoTheVoyager

Helped refine and polish as well as come up with unique and distinct ideas to further progress the article.


663

Comments