Sign In

NOOBAI-XL FULL OFFICIAL GUIDE [TRANSLATED]

72

The translation is done with Gemini Experimental 1206.

All links in the article have been replaced by placeholders, original links are available in the original article.

1. Introduction

This document aims to provide a comprehensive and up-to-date introduction to the NoobAI-XL model.

Please note that due to the dynamic nature of information and the difficulty of maintenance, this document may contain errors or omissions.

1.1 Basic Introduction

NoobAI-XL is a text-to-image diffusion model developed by Laxhar Dream Lab and sponsored by BlueRun. The model license inherits from fair-ai-public-license-1.0-sd and includes the additional restrictions of the NoobAI-XL model license. The model is based on the SDXL architecture and uses Illustrious-xl-early-release-v0 as its base model. It has been trained for a large number of epochs on the complete Danbooru and e621 datasets (approximately 13,000,000 images in total), resulting in extensive knowledge and excellent performance.

1.2 Overview

NoobAI-XL possesses a vast amount of knowledge, capable of reproducing tens of thousands of anime characters and artist styles, recognizing a large number of special concepts in the anime domain, and having extensive knowledge of furry.

NoobAI-XL offers two versions: noise prediction and V-prediction. In short, the noise prediction version generates more diverse and creative images, while the V-prediction version adheres more closely to the prompts, producing images with a wider color gamut and stronger light and shadow effects.

NoobAI-XL has a growing ecosystem of community support, including various LoRAs, ControlNets, IP-Adapters, and more.

NoobAI-XL includes a series of models, mainly noise prediction and V-prediction, which will be described in detail later.

2. Quick Start

Before reading this section, readers should already be familiar with the basic usage of any WebUI, ComfyUI, forge, or reForge, etc. Otherwise, readers need to learn from here or from the internet (such as Bilibili, etc.) on their own.

2.1 Model Download

Model Download Sites

  • CivitAI: Click here (Note: May require VPN)

  • LiblibAI: Click here

  • Huggingface: Click here (Note: May require VPN)If you are unsure which model to download, you can browse here.

2.2 Model Loading

NoobAI-XL models are divided into two categories: noise prediction (epsilon prediction, or eps-pred for short) models and V-prediction (v-prediction, or v-pred for short) models. Models with "eps", "epsilon-pred", or "eps-pred" in their names are noise prediction models, which are not significantly different from other models. If you use them, you can skip this section. Models with "v" or "v-pred" in their names are V-prediction models, which are different from most conventional models. Please read the installation guide in this section carefully! For an introduction to the principles of V-prediction models, please refer to this article.

2.2.1 Loading V-prediction Models

V-prediction is a relatively rare model training technique, and models trained using this technique are called V-prediction models. Compared to noise prediction, V-prediction models are known for their higher prompt adherence, wider color gamut, and stronger light and shadow effects. Examples include NovelAI Diffusion V3 and COSXL. Due to their late emergence and the scarcity of such models, some mainstream image generation projects and UIs do not directly support them. Therefore, if you intend to use V-prediction models, some additional operations are required. This section will introduce their specific usage. If you encounter any difficulties during use, you can also directly contact any of the model authors for assistance.

a. Using in forge or reForge

forge and reForge are two image generation UIs developed by lllyasviel and Panchovix, respectively, both of which are extended versions of WebUI. Their main branches support V-prediction models, and their operation mode is almost the same as WebUI, so they are recommended. If you have already installed it, you only need to run git pull in its installation directory to update and then restart. If you have not installed it, you can refer to online tutorials for installation and use.

b. Using in ComfyUI

ComfyUI is an image generation UI developed by comfyanonymous, allowing users to freely manipulate nodes, known for its flexibility and professionalism. Using V-prediction models in it only requires adding additional nodes.

c. Using in WebUI

WebUI refers to the project stable-diffusion-webui developed by AUTOMATIC1111. Currently, the main branch of WebUI, the main branch, does not support V-prediction models. You need to switch the branch to dev. Please note that this method is unstable and may have bugs. Improper use may even cause irreversible damage to WebUI. Therefore, please back up your WebUI in advance. The specific method is as follows:

  1. If you have not installed WebUI, please refer to online tutorials to install it;

  2. Open the console or terminal in your stable-diffusion-webui installation directory;

  3. Enter the command git checkout dev and press Enter;

  4. Restart WebUI.

d. Using in Diffusers

Diffusers is a Python library dedicated to diffusion models. This usage requires users to have a certain coding foundation and is recommended for developers and researchers. Code example:

import torch
from diffusers import StableDiffusionXLPipeline, EulerDiscreteScheduler

ckpt_path = "/path/to/model.safetensors"
pipe = StableDiffusionXLPipeline.from_single_file(
    ckpt_path,
    use_safetensors=True,
    torch_dtype=torch.float16,
)
scheduler_args = {"prediction_type": "v_prediction", "rescale_betas_zero_snr": True}
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config, **scheduler_args)
pipe.enable_xformers_memory_efficient_attention()
pipe = pipe.to("cuda")

prompt = """masterpiece, best quality, john_kafka, nixeu, quasarcake, chromatic aberration, film grain, horror \(theme\), limited palette, x-shaped pupils, high contrast, color contrast, cold colors, arlecchino \(genshin impact\), black theme, gritty, graphite \(medium\)"""
negative_prompt = "nsfw, worst quality, old, early, low quality, lowres, signature, username, logo, bad hands, mutated hands, mammal, anthro, furry, ambiguous form, feral, semi-anthro"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=832,
    height=1216,
    num_inference_steps=28,
    guidance_scale=5,
    generator=torch.Generator().manual_seed(42),
).images[0]

image.save("output.png")
    

2.3 Model Usage

2.3.1 Prompts

NoobAI-XL has no strict requirements for prompts, and the recommended operations in this article are just icing on the cake.

NoobAI-XL recommends users to use tags as prompts to add desired content. Each tag is an English word or phrase, and tags are separated by a comma ", ". Tags directly from Danbooru and e621 have stronger effects. To further achieve better results, you can refer to the prompt specifications below.
We recommend always adding the aesthetic tag "very awa" and the quality tag "masterpiece" to the prompt.
NoobAI-XL supports generating highly accurate characters and artist styles, both triggered by tags, which we call "trigger words". For characters, the trigger word is their character name; for artist styles, the trigger word is the artist's name. The complete trigger word table can be downloaded from noob-wiki. A detailed introduction to trigger words can be found below.
Similar to NovelAI, NoobAI-XL supports special tags such as quality, aesthetics, creation year, creation period, and safety rating, which are used as auxiliary tools. Interested readers can find them in the detailed introduction below.

2.3.2 Generation Parameters

a. Basic Parameters

The table below recommends three generation parameters: Sampler, Sampling Steps, and CFG Scale. Bold indicates strongly recommended; bold red indicates mandatory requirements, and using other parameter values will bring unexpected results.

Recommended Generation Parameters

  • All noise prediction versions:

    • Sampler: Euler A

    • cfg: 5~7

    • Sampling Steps: 28~35

  • V-prediction 0.9r version:

    • Sampler: Euler

      • cfg: 3.5~5.5

      • Sampling Steps: 32~40

    • Sampler: Euler A

      • cfg: 3~4

      • Sampling Steps: 38~40

  • V-prediction 0.75s version:

    • Sampler: Euler A

    • cfg: 3~4

    • Sampling Steps: 38~40

  • V-prediction 0.65s version:

    • Sampler: Euler A

      • cfg: 3~5

      • Sampling Steps: 28~40

  • V-prediction 0.6 version:

    • Sampler: Euler A or Euler

      • cfg: 3.5~5.5

      • Sampling Steps: 32~40

    • Sampler: Euler A

      • cfg: 5~7

      • Sampling Steps: 28~35

  • V-prediction 0.5 version:

    • Sampler: Euler

      • cfg: 3.5~5.5

      • Sampling Steps: 28~35

  • V-prediction test version:

    • Sampler: Euler A

    • cfg: 5~7

    • Sampling Steps: 28~35

b. V-prediction Model Precautions

For V-prediction models, it is recommended to use the following parameters to (i) optimize color, light and shadow, and details; (ii) eliminate the effects of oversaturation and overexposure; (iii) enhance semantic understanding.

  1. Any optimizer with Rescale CFG (around 0.7) parameter. Some image generation UIs do not support it.

  2. Or, Euler Ancestor CFG++ sampler, and set CFG Scale between 1 and 1.8. Some image generation UIs do not support it.

  3. Due to compatibility issues, some samplers may cause V-prediction models to generate images with excessive saturation or fragmented lines, such as the DPM series samplers.

c. Resolution

The resolution (width x height) of the generated image is an important parameter. Generally speaking, due to architectural reasons, all SDXL models, including NoobAI-XL, need to use specific resolutions to achieve the best results. Not a single pixel more or less, otherwise the quality of the generated image will be weakened. The recommended resolutions for NoobAI-XL are as follows:

Recommended Resolutions

  • Resolution (width x height): 768x1344, Ratio: 9:16

  • Resolution (width x height): 832x1216, Ratio: 2:3

  • Resolution (width x height): 896x1152, Ratio: 3:4

  • Resolution (width x height): 1024x1024, Ratio: 1:1

  • Resolution (width x height): 1152x896, Ratio: 4:3

  • Resolution (width x height): 1216x832, Ratio: 3:2

  • Resolution (width x height): 1344x768, Ratio: 16:9

You can also use larger resolutions, although this is not stable. (According to research on SD3, when the generated area is increased by a factor of k, the model's uncertainty will increase by a factor of k^2.) We recommend that the area of the generated image should not exceed 1.5 times the original area. For example, 1024x1536.

2.3.3 Other Precautions

  1. V-prediction models are more sensitive to prompts and generation parameters;

  2. CLIP skip does not apply to all SDXL architecture models, so there is no need to set it;

  3. The model does not need to use any other VAE models;

2.4 Other Resources

  • Created by 年糕特工队, NOOBAI XL Quick Guide provides a beginner's tutorial for NoobAI-XL, recommended for beginners. Its English version is NOOBAI XL Quick Guide.

  • Produced by 风吟, "A video to teach you how to use V-prediction models, NoobAI tutorial for beginners" provides a video tutorial on deploying V-prediction models.

3.1 Model Overview

3.1.1 Base Models

NoobAI-XL includes a series of base models with different versions. The table below summarizes the characteristics of each version.

Base Model Versions

  • Version Number: Early-Access

    • Prediction Type: Noise Prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: Illustrious-xl-early-release-v0

    • Version Characteristics: The earliest version, but already has sufficient training.

  • Version Number: Epsilon-pred 0.5

    • Prediction Type: Noise Prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: Early-Access

    • Version Characteristics: (Recommended) The most stable version, the only drawback is insufficient knowledge of niche concepts.

  • Version Number: Epsilon-pred 0.6

    • Prediction Type: Noise Prediction

    • Download Address: Huggingface

    • Iterated from: Early-Access 0.5

    • Version Characteristics: (Recommended) The last version trained purely on UNet, with excellent convergence. The test group called it "178000" and it was liked by many.

  • Version Number: Epsilon-pred 0.75

    • Prediction Type: Noise Prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: Epsilon-pred 0.6

    • Version Characteristics: Trained the text encoder (tte) to learn more niche knowledge, but with some degradation in quality.

  • Version Number: Epsilon-pred 0.77

    • Prediction Type: Noise Prediction

    • Download Address: Huggingface

    • Iterated from: Epsilon-pred 0.75

    • Version Characteristics: Trained two more epochs on the basis of Epsilon-pred 0.75 to improve the performance degradation.

  • Version Number: Epsilon-pred 1.0

    • Prediction Type: Noise Prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: Epsilon-pred 0.77

    • Version Characteristics: (Recommended) Additionally trained 10 epochs to consolidate the new knowledge of tte, balanced performance.

  • Version Number: V-pred test

    • Prediction Type: V-prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: Epsilon-pred 0.5

    • Version Characteristics: (Not recommended) The initial experimental version of V-prediction.

  • Version Number: V-pred 0.5

    • Prediction Type: V-prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: Epsilon-pred 1.0

    • Version Characteristics: Has the problem of oversaturation.

  • Version Number: V-pred 0.6

    • Prediction Type: V-prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: V-pred 0.5

    • Version Characteristics: The saturation problem is somewhat alleviated. Based on preliminary evaluation results, V-pred 0.6 performs exceptionally well in terms of rare knowledge coverage, reaching the highest level among currently released models. At the same time, this model significantly improves the quality degradation problem.

  • Version Number: V-pred 0.65

    • Prediction Type: V-prediction

    • Download Address: Huggingface

    • Iterated from: V-pred 0.6

    • Version Characteristics: Has the problem of oversaturation.

  • Version Number: V-pred 0.65s

    • Prediction Type: V-prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: V-pred 0.6

    • Version Characteristics: The saturation problem is almost solved!

  • Version Number: Epsilon-pred 1.1

    • Prediction Type: Noise Prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: Epsilon-pred 1.0

    • Version Characteristics: (Recommended) Solved the average brightness problem, with improvements in all aspects.

  • Version Number: V-pred 0.75

    • Prediction Type: V-prediction

    • Download Address: Huggingface

    • Iterated from: V-pred 0.65

    • Version Characteristics: Has the problem of oversaturation.

  • Version Number: V-pred 0.75s

    • Prediction Type: V-prediction

    • Download Address: CivitAI, Huggingface

    • Iterated from: V-pred 0.65

    • Version Characteristics: (Recommended) Solves the problems of saturation, noise, and graininess in extreme cases;

3.1.2 Extended Models: ControlNet

ControlNet Models

  • Prediction Type: Noise Prediction

    • ControlNet Type: hed soft edge

    • Link: CivitAI, Huggingface

    • Preprocessor Type: softedge_hed

  • Prediction Type: Noise Prediction

    • ControlNet Type: anime lineart

    • Link: CivitAI, Huggingface

    • Preprocessor Type: lineart_anime

  • Prediction Type: Noise Prediction

    • ControlNet Type: midas normal map

    • Link: CivitAI, Huggingface

    • Preprocessor Type: normal_midas

  • Prediction Type: Noise Prediction

    • ControlNet Type: midas depth map

    • Link: CivitAI, Huggingface

    • Preprocessor Type: depth_midas

  • Prediction Type: Noise Prediction

    • ControlNet Type: canny contour

    • Link: CivitAI, Huggingface

    • Preprocessor Type: canny

  • Prediction Type: Noise Prediction

    • ControlNet Type: openpose human pose

    • Link: CivitAI, Huggingface

    • Preprocessor Type: openpose

  • Prediction Type: Noise Prediction

    • ControlNet Type: manga line

    • Link: CivitAI, Huggingface

    • Preprocessor Type: manga_line / lineart_anime / lineart_realistic

  • Prediction Type: Noise Prediction

    • ControlNet Type: realistic lineart

    • Link: CivitAI, Huggingface

    • Preprocessor Type: lineart_realistic

  • Prediction Type: Noise Prediction

    • ControlNet Type: midas depth map

    • Link: CivitAI, Huggingface

    • Preprocessor Type: depth_midas

    • Note: New version

  • Prediction Type: Noise Prediction

    • ControlNet Type: hed scribble

    • Link: CivitAI, Huggingface

    • Preprocessor Type: scribble_hed

  • Prediction Type: Noise Prediction

    • ControlNet Type: pidinet scribble

    • Link: CivitAI, Huggingface

    • Preprocessor Type: scribble_pidinet

Note that when using ControlNet, you must match the type of preprocessor you are using with the type of preprocessor required by ControlNet. In addition, you may not need to match the prediction type of the base model with the prediction type of ControlNet.

3.1.3 Extended Models: IP-Adapter

IP-Adapter (IPA) has been released on Huggingface and CivitAI.

3.1.4 LoRA Models

Most LoRAs trained on the NoobAI-XL noise prediction version can be used on both noise prediction and V-prediction versions, and vice versa.

3.2 Prompt Guide

First, we need to clarify that the role of prompts is to guide. Good prompts unleash the potential of the model, but bad or even wrong prompts do not necessarily make the results worse. Different models have different optimal prompt usages. The effects of misuse are often not obvious, and in a few cases, they may even be better. This prompt guide records the theoretically optimal prompt writing method for the model, and capable readers can also use their own creativity.

This section will provide a detailed prompt writing guide, including prompt writing specifications, specific usage of character and style trigger words, usage of special tags, and more. Readers interested in prompt engineering can choose to read selectively.

3.2.1 Prompt Specifications

NoobAI-XL has the same prompt specifications as other anime models. This section will systematically introduce the basic writing specifications of prompts and help readers avoid common prompt writing misconceptions in the community.
According to the different formats, prompts can be roughly divided into two categories: tags and natural language. The former is mostly used for anime models, and the latter is mostly used for realistic models. Regardless of the type of prompt, unless the model specifically states otherwise, it should only contain English letters, numbers, and English symbols. Note that the Chinese comma "," cannot be used instead of the English comma ",", as they are not equivalent.
Tag prompts consist of lowercase English words or phrases separated by commas ", ", for example, "1girl, solo, blue hair" contains three tags, "1girl", "solo", and "blue hair".
Extra spaces, line breaks, etc. in the prompt will not affect the actual generation effect. In other words, "1girl, solo, blue hair" and "1girl,solo,blue hair" have exactly the same effect.
Prompts should not contain any underscores "_". Influenced by websites such as Danbooru, the usage of underscores "_" instead of spaces " " between words in tags has spread, which is actually a misuse and will cause the generated results to be different from using spaces. Most models, including NoobAI-XL, do not recommend including any underscores in prompts. Such misuse can range from affecting the quality of generation to making trigger words partially or even completely ineffective.
Escape parentheses when necessary. Parentheses, including round brackets, square brackets, and curly brackets, are very special symbols in prompts. Unlike general symbols, in most image generation software and UIs, parentheses will be interpreted as weighting specific content, and the parentheses involved in weighting will not have their original meaning. However, if the original prompt needs to contain parentheses, such as some trigger words, what should be done? The answer is to add a backslash "" before the parentheses to eliminate the weighting function of the parentheses. This operation of changing the original meaning of a character is called escaping, and the backslash is also called an escape character. For example, if a backslash is not used for escaping, the prompt "1girl, ganyu (genshin impact)" will be incorrectly interpreted as "1girl, ganyu genshin impact", where "genshin impact" is weighted, and the parentheses disappear. By adding an escape character, the prompt becomes "1girl, ganyu (genshin impact)", which is as expected.
In short, tag normalization is divided into two steps: (i) replace the underscores in the tag with spaces, and (ii) add a backslash "" before parentheses.

Tags directly from Danbooru and e621 have stronger performance. Therefore, instead of creating tags yourself, we recommend readers to directly search for tags on these two websites. It should be noted that the tags obtained directly in this way are separated by underscores "_" between words, and the parentheses are not escaped. Therefore, before adding tags from them to the prompt, you need to remove the spaces in the tags and escape the parentheses. For example, convert the tag "ganyu_(genshin_impact)" from Danbooru to "ganyu (genshin impact)" before using it.
Do not use invalid meta tags. Meta tags are a special category of tags on Danbooru, used to indicate the characteristics of the image file or the work itself. For example, "highres" indicates that the image has a high resolution, and "oil_painting_(medium)" indicates that the image is in the style of an oil painting. However, not all meta tags are related to the content or form of the image. For example, "commentary_request" indicates that the Danbooru post has a translation request for the work, which has no direct relationship with the work itself, so it has no effect.
Ordered prompts are better. NoobAI-XL recommends writing prompts in a logical order, from primary to secondary. A possible writing order is as follows, please use it as a reference only:
<1girl/1boy/1other/female/male/...>, <character>, <series>, <artist(s)>, <general tags>, <other tags>, <quality tags>
Among them, <quality tags> can be placed at the beginning.

Natural language prompts are composed of sentences, each sentence starting with a capital letter and ending with a period ".". Most anime models, including NoobAI-XL, have a better understanding of tags, so natural language is often used as an auxiliary rather than a primary component in prompts.

3.2.2 Character and Artist Styles

NoobAI-XL supports the direct generation of a large number of fan-made anime characters and artist styles. Both characters and styles are triggered by their names, and such names are also tags, called trigger words. You can directly search on Danbooru or e621, and use the obtained tags after normalization as prompts.

3.2.2.1 Usage

There are some differences in how characters and artists are triggered.

  • For artist styles, just add the artist's name to the prompt without any prefixes, suffixes, or additional modifiers, neither "by xxx" nor "artist:xxx", just "xxx".

  • For characters, use the format "character name + series". That is, in addition to adding the character name, you also need to add a series tag immediately after the character trigger word tag to indicate which work the character is from. If a character has multiple series tags, adding one or more of them is acceptable. Please note that if the character name already contains the series name, you still need to add the series tag, without considering the repetition issue. Usually, "character name + series name" is sufficient to restore the character. For example, the trigger word for the character "ganyu_(genshin_impact)" from the series "genshin_impact" is "ganyu (genshin impact), genshin impact". Similarly, character trigger words do not need to add any prefixes, suffixes, or additional modifiers.

The table below shows some correct and incorrect examples of character and style triggering:

Character and Artist Style Trigger Examples

  • Type: Character

    • Prompt: Rei Ayanami

    • Correct/Incorrect: Incorrect

    • Reason: 1. The character name should be "ayanami rei". 2. The series tag "neon genesis evangelion" is not added.

  • Type: Character

    • Prompt: character:ganyu (genshin impact), genshin impact

    • Correct/Incorrect: Incorrect

    • Reason: Unnecessarily added the prefix "character:".

  • Type: Character

    • Prompt: ganyu_(genshin impact)

    • Correct/Incorrect: Incorrect

    • Reason: 1. The tag is not fully normalized: it should not contain underscores. 2. The series tag is not added.

  • Type: Character

    • Prompt: ganyu (genshin impact), genshin impact

    • Correct/Incorrect: Incorrect

    • Reason: The tag is not fully normalized: the parentheses are not escaped.

  • Type: Character

    • Prompt: ganyu (genshin impact), genshin impact

    • Correct/Incorrect: Incorrect

    • Reason: 1. The tag is not fully normalized: the left parenthesis is not escaped.

  • Type: Character

    • Prompt: ganyu (genshin impact),genshin impact

    • Correct/Incorrect: Incorrect

    • Reason: Used a Chinese comma to separate the two tags.

  • Type: Character

    • Prompt: ganyu (genshin impact), genshin impact

    • Correct/Incorrect: Correct

  • Type: Artist Style

    • Prompt: by wlop

    • Correct/Incorrect: Incorrect

    • Reason: Unnecessarily added the prefix "by ".

  • Type: Artist Style

    • Prompt: artist:wlop

    • Correct/Incorrect: Incorrect

    • Reason: Unnecessarily added the prefix "artist:".

  • Type: Artist Style

    • Prompt: dino

    • Correct/Incorrect: Incorrect

    • Reason: The artist's name is wrong. The artist's name of aidxl/artiwaifu should not be used, but Danbooru should be followed, so it should be "dino (dinoartforame)".

  • Type: Artist Style

    • Prompt: wlop

    • Correct/Incorrect: Correct

3.2.2.2 Trigger Word Encyclopedia

For convenience, we also provide complete trigger word tables in noob-wiki for your reference:
Trigger Word Table Information

  • Danbooru Characters: Click here

  • Danbooru Artist Styles: Click here

  • e621 Characters: Click here

  • e621 Artist Styles: Click here

Trigger Word Table Column Explanations

  • Column Name: character

    • Meaning: The tag name of the character on the corresponding website.

  • Column Name: artist

    • Meaning: The tag name of the artist style on the corresponding website.

  • Column Name: trigger

    • Meaning: The normalized trigger word.

    • Note: Copy and paste it into the prompt as is.

  • Column Name: count

    • Meaning: The number of images with this tag. Access requires a VPN.

    • Note: As an expectation of the accuracy of this concept. For characters, a count greater than 200 can restore well. For styles, a count greater than 100 can restore well.

  • Column Name: url

    • Meaning: The tag page on the original website.

    • Note: Requires a VPN.

  • Column Name: solo_count

    • Meaning: In the dataset, the number of images with this tag and only one character in the image.

    • Note: Only character tables. For characters, a solo_count greater than 50 can restore well. When judging the accuracy by count, the deviation of the count column is large and the accuracy is low, while solo_count is a more accurate indicator.

  • Column Name: core_tags

    • Meaning: The core feature tags of the character, including appearance, gender, and clothing. Separated by English commas, each tag is normalized.

    • Note: Only Danbooru character tables. When triggering niche characters and their accuracy is insufficient, you can add several core feature tags to enhance the accuracy.

3.2.3 Special Tags

Special tags are a type of tag with specific meanings and effects, serving an auxiliary role.

3.2.3.1 Quality Tags

Popularity tags derived from Danbooru and e621 user preference statistics. In descending order of quality, they are:
masterpiece > best quality > high quality / good quality > normal quality > low quality / bad quality > worst quality

3.2.3.2 Aesthetic Tags

Aesthetic tags obtained by scoring with an aesthetic scoring model. So far, there are only two, "very awa" and "worst aesthetic". The former represents the top 5% of data weighted by waifu-scorer-v3 and waifu-scorer-v4-beta, and the latter represents the bottom 5% of data. It is named "very awa" because its aesthetic standard is similar to the ArtiWaifu Diffusion model. In addition, an aesthetic tag that is still in training and has an insignificant effect is "very as2", which represents the top 5% of data scored by "aesthetic-shadow-v2-5".
Aesthetic Tag Comparison
Comparison of the effects of quality and aesthetic tags. The image was generated by the v-pred-0.65s version.
Quality tags reflect the popularity of the image, and aesthetic tags reflect the aesthetics of a specific image scoring model.
The "very awa" tag helps to enhance the artistic aesthetics of the image while eliminating the "AI feeling";
"very as2" is superior in training but not yet fully developed, so the effect is not obvious.

3.2.3.3 Safety Rating Tags

There are four safety rating tags: general, sensitive, nsfw, and explicit.
Users are expected to consciously add "nsfw" to negative prompts to filter inappropriate content.

3.2.3.4 Year and Period Tags

Year tags are used to indicate the year of creation of the work, indirectly affecting the quality, style, character accuracy, etc. The format is "year xxxx", where "xxxx" is the specific year, such as "year 2024".
Period tags are a range of year tags, which also have a great impact on image quality. The correspondence between tags and years is shown in the table below:
Year and Period Tag Correspondence

  • Year Range: 2021~2024, Period Tag: newest

  • Year Range: 2018~2020, Period Tag: recent

  • Year Range: 2014~2017, Period Tag: mid

  • Year Range: 2011~2013, Period Tag: early

  • Year Range: 2005~2010, Period Tag: old

3.2.4 Other Tips

This section provides examples of recommended usage of prompts, for reference only.

3.2.4.1 Quality Prompts

The following recommended starting formula uses special tags, which are the most relevant tags related to image quality:

masterpiece, best quality, very awa

3.2.4.2 Negative Prompts

The table below introduces common negative prompt tags and their sources. Not all negative tags are necessarily bad, and proper use can have unexpected effects.
Tag (tag): worst aesthetic

  • Translation: Worst aesthetics

  • Note: Includes low-quality, watermarked, manga, multiple views, unfinished sketches, and other low-aesthetic concepts

  • Source: Aesthetic Tags

  • Tag (tag): worst quality

    • Translation: Worst quality

    • Source: Quality Tags

  • Tag (tag): low quality

    • Translation: Low quality

    • Note: Danbooru's low quality

    • Source: Quality Tags

  • Tag (tag): bad quality

    • Translation: Bad quality

    • Note: e621's low quality

    • Source: Quality Tags

  • Tag (tag): lowres

    • Translation: Low resolution

    • Source: Danbooru

  • Tag (tag): scan artifacts

    • Translation: Scan artifacts

    • Source: Danbooru

  • Tag (tag): jpeg artifacts

    • Translation: JPEG image compression artifacts

    • Source: Danbooru

  • Tag (tag): lossy-lossless

    • Translation: -

    • Note: Images that have been converted from a lossy image format to a lossless image format, often full of artifacts.

    • Source: Danbooru

  • Tag (tag): ai-generated

    • Translation: AI-generated

    • Note: Generated by AI, often has the greasy feeling of AI generation.

    • Source: Danbooru

  • Tag (tag): abstract

    • Translation: Abstract

    • Note: Eliminates messy lines

    • Source: Danbooru

  • Tag (tag): official art

    • Translation: Official art

    • Note: Illustrations produced by the official company/artist of the series or character. The image may have copyright, company, or artist names printed somewhere, as well as a copyright statement.

    • Source: Danbooru

  • Tag (tag): old

    • Translation: Early image

    • Source: Period Tags

  • Tag (tag): 4koma

    • Translation: 4-panel manga

    • Source: Danbooru

  • Tag (tag): multiple views

    • Translation: Multiple views

    • Source: Danbooru

  • Tag (tag): reference sheet

    • Translation: Character design sheet

    • Source: Danbooru

  • Tag (tag): dakimakura (medium)

    • Translation: Body pillow image

    • Source: Danbooru

  • Tag (tag): turnaround

    • Translation: Full-body turnaround

    • Source: Danbooru

  • Tag (tag): comic

    • Translation: Comic

    • Source: Danbooru

  • Tag (tag): greyscale

    • Translation: Greyscale

    • Note: Black and white image

    • Source: Danbooru

  • Tag (tag): monochrome

    • Translation: Monochrome

    • Note: Black and white image

    • Source: Danbooru

  • Tag (tag): sketch

    • Translation: Sketch

    • Source: Danbooru

  • Tag (tag): unfinished

    • Translation: Unfinished work

    • Source: Danbooru

  • Tag (tag): furry

    • Translation: Furry

    • Source: e621

  • Tag (tag): anthro

    • Translation: Anthropomorphic furry

    • Source: e621

  • Tag (tag): feral

    • Translation: Feral

    • Source: e621

  • Tag (tag): semi-anthro

    • Translation: Semi-anthropomorphic furry

    • Note: When added, it seems to make the image color yellowish

    • Source: e621

  • Tag (tag): mammal

    • Translation: Mammal (furry)

    • Source: e621

  • Tag (tag): watermark

    • Translation: Watermark

    • Source: Danbooru

  • Tag (tag): logo

    • Translation: Logo

    • Source: Danbooru

  • Tag (tag): signature

    • Translation: Artist signature

    • Source: Danbooru

  • Tag (tag): text

    • Translation: Text

    • Source: Danbooru

  • Tag (tag): artist name

    • Translation: Artist name

    • Source: Danbooru

  • Tag (tag): dated

    • Translation: Date

    • Source: Danbooru

  • Tag (tag): username

    • Translation: Username

    • Source: Danbooru

  • Tag (tag): web address

    • Translation: Website address

    • Source: Danbooru

  • Tag (tag): bad hands

    • Translation: Bad hands

    • Source: Danbooru

  • Tag (tag): bad feet

    • Translation: Bad feet

    • Source: Danbooru

  • Tag (tag): extra digits

    • Translation: Extra fingers

    • Source: Danbooru

  • Tag (tag): fewer digits

    • Translation: Fewer fingers

    • Source: Danbooru

  • Tag (tag): extra arms

    • Translation: Extra arms

    • Source: Danbooru

  • Tag (tag): extra faces

    • Translation: Extra faces

    • Source: Danbooru

  • Tag (tag): multiple heads

    • Translation: Multiple heads

    • Source: Danbooru

  • Tag (tag): missing limb

    • Translation: Missing limb

    • Source: Danbooru

  • Tag (tag): amputee

    • Translation: Amputee

    • Source: Danbooru

  • Tag (tag): severed limb

    • Translation: Severed limb

    • Source: Danbooru

  • Tag (tag): mutated hands

    • Translation: Mutated hands

    • Source: -

  • Tag (tag): distorted anatomy

    • Translation: Distorted anatomy

    • Source: -

  • Tag (tag): nsfw

    • Translation: Not Safe For Work

    • Source: Safety Rating Tags

  • Tag (tag): explicit

    • Translation: Explicit

    • Source: Safety Rating Tags

  • Tag (tag): censored

    • Translation: Censored

    • Source: Danbooru

3.2.4.3 Tag Misuse

Commonly Misused Tags

  • Tag (tag): bad id

    • Translation: Corrupted image ID

    • Note: Related to image metadata, not image content

    • Source: Danbooru

  • Tag (tag): bad link

    • Translation: Corrupted image link

    • Note: Related to image metadata, not image content

    • Source: Danbooru

  • Tag (tag): duplicate

    • Translation: Duplicate image on the website

    • Note: Related to quality to some extent, but not content duplication

    • Source: Danbooru

72

Comments