Anime Illust Diffusion XL
Model Introduction (英文部分)
I Contents
In this introduction, you'll learn about:
Model information (see Section II);
Instructions for use (see Section III);
Training parameters (see Section IV);
List of Trigger Words (see Appendix Part A)
II AIDXL
Anime Illustration Design XL, or AIDXL, is a model dedicated to generating stylized anime illustrations. It has more than 200 (with more and more updates) built-in illustration styles, which are triggered by specific trigger words (see Appendix A).
Advantages:
Flexible composition rather than traditional AI posing.
Skillful details rather than chaos.
Knows anime characters better.
It's a little bit difficult. Not recommended for beginners.
III User Guide
(Keep pace with the times)
Now, there is no obvious difference in the generation between WebUI and ComfyUI.
1 Suggested Generation Parameters
Usage 1: Use Directly
That is, intuitively, triggering styles by directly using the trigger words in the text-to-image.
Usage 2: Stylization
That is, use trigger words and a denoising strength of 0.5~0.65 in the image-to-image to stylize the image by trigger words.
If you are unable to generate an image similar to the model cover, please follow the guidelines below.
Resolution: Keep the total image resolution between 768*768 and 1152*1152, where the total resolution is calculated as the height times the width of the image. Otherwise, the generated image be of low quality or deformity.
Prompt format: Arrange the prompt form as natural language + tag to write positive prompt.
For the natural language part, increase the density of nouns in natural language and avoid using abstract adjectives or using multiple adjectives to modify nouns in a superimposed manner.
Also, there is no need to use too long negative prompt. The number of tags in your negative prompt should not exceed 10.
Prompt writing: Prompts are essential. A good prompt requires a specific and nominal description, which is crucial to generating a masterful illustration.
The following tips is important for the AIDXL models:
Both the too short prompts with confusing descriptions and the too long prompts with lots of weighting syntax will lead to chaotic and unstable image generation.
Here are the explanations of the two prompting manners. (i) Too short prompts are abstract for the model. Maybe even you don’t know what you want. (ii) Too long prompts are too complex for the model in the one hand. In the other hand, this kind of prompt is poor at replication on other models.
In other models, both of the two manners may work. That's because, for (i), the model is good at processing short and confusing prompts due to its particularity, or the model has super powerful CLIP model. For (ii), the models that recognize the same complicated prompts are similar.
AIDXL models are trained on WaifuTagger tags. So, using 10 to 35 tags may give better result.
Generation parameters: Skipping clip is no needed. Namely, set 'clip skip' to 1.
The
dpmpp_2m
sampler with thekarras
scheduler is recommended. It's calledDPM++ 2M Karras
in WebUI.Sample at least 25 steps over 4~9 CFG. 35 steps with 7 CFG is recommended.
Characters: The model is trained on WaifuTagger and knows more anime characters. They are triggered by the character's name. For example, tag "ayanami rei" corresponds to the character "Ayanami Rei", "kamado nezuko" corresponds to the character "Nezuko", and "lucy \(cyberpunk\) corresponds to the character "Lucy" in "Cyberpunk: Edgerunners", where "\(cyberpunk\)" is the comment. You can find all the supporting characters and their corresponding tags here: selected_tags.csv · SmilingWolf/wd-v1-4-swinv2-tagger-v2 at main (huggingface.co).
Other components: No need to use any refiner model.
Use VAE of the model itself or the
sdxl-vae
.
2 Notes
Component BUGs: If some components do not work properly, please check whether the component is designed for SDXL or not. This usually happens on VAEs, text inversion embeddings and Loras.
Note that the
sd-vae-ft-mse-original
is not an SDXL-capable VAE model but SD1.5. Negative embeddings likeEasyNegative
andbadhandv4
are also not SDXL-capable embeddings.Specially-made components: I DO recommend using the model-specific negative embedding (see the Suggested Resources), which is specially trained for the model. It almost has positive effects only.
Partially underfitting problem: The new added trigger words in a version will be relatively weak or unstable. This phenomenon will always improve in its next few versions. Usually, the next version will be fine.
Quality trigger words: Starting from version v0.5, some new quality trigger words have been added. They are captioned to the training image manually. Their effects are weak currently.
3 Experiments
The stylization effect of trigger words can merge with each other to produce new styles. For example,
[by A | by B | by C]
.
IV Training Parameters
AIDXLv0.1
Using SDXL1.0 as the base model, using about 22k labeled images to train about 100 epochs on a cosine scheduler with a learning rate of 5e-6 and a number of cycles = 1 to obtain model A. Then, using a learning rate of 2e-7 and the same other parameters to obtain model B. The AIDXLv0.1 model is obtained by merging model A and B.
AIDXLv0.51
Training Strategy
Resume training from AIDXLv0.5, there are three runs of training pipelined one by one:
Long caption training: Use the whole dataset, with some images captioned manually. Start training both the U-Net and the text encoder with the AdamW8bit optimizer, a high learning rate (around 1.5e-6) with cosine scheduler. Stop training when the learning rate decays below a threshold (around 5e-7).
Short caption training: Restart training from the output of step 1. with the same parameters and strategy but a dataset with a shorter caption length.
Refining step: Prepare a subset of the dataset in step 1. that contains manually picked images of high quality. Restart training from the output of step 2. with a low learning (around 7.5e-7), with cosine scheduler with restarts 5 to 10 turns. Train until the result is aesthetically good.
Fixed Training Parameters
No extra noise like noise offset.
Min snr gamma = 5: speed up training.
Full bf16 precision.
AdamW8bit optimizer: a balance between efficiency and performance.
Dataset
Resolution: 1024x1024 total resolution (= height time width) with a modified SDXL officially bucketing strategy.
Captioning: Captioned by WD14-Swinv2 model with 0.35 threshold.
Close-up cropping: Crop images into several close-ups. It's very useful when the training images are large or rare.
Trigger words: Keep the first tag of images as their trigger words.
V AIDXL vs AID
2023/08/08. AIDXL is trained on the same training set as AIDv2.10, but outperforms AIDv2.10. AIDXL is smarter and can do many things that SD1.5-based models cannot. It also does a really good job of distinguishing between concepts, learning image detail, handling compositions that are difficult or even impossible for SD1.5 and AID. Overall, it is absolute potential. I'll keep updating AIDXL.
模型介绍(Chinese Part)
I 目录
在本介绍中,您将了解:
模型介绍(见 II 部分);
使用指南(见 III 部分);
训练参数(见 IV 部分);
触发词列表(见附录 A 部分)
II 模型介绍
动漫插画设计XL,或称 AIDXL 是一款专用于生成二次元插图的模型。它内置了 200 种以上(随着更新越来越多)的插画风格,依靠特定触发词(见附录 A 部分)触发。
优点:构图大胆,没有摆拍感,主体突出,没有过多繁杂的细节,认识很多动漫人物(依靠角色日文名拼音触发,例如,“ayanami rei”对应角色“绫波丽”,“kamado nezuko”对应角色“祢豆子”)。
模型难度较大,不推荐入门者使用。
III 使用指南(将与时俱进)
推荐使用 ComfyUI 生成图像……
现在,WebUI和ComfyUI在生成式无明显差别。
1 生成参数
用法一:直接使用
即直观地,在文生图中直接使用触发词触发风格。
用法二:风格化
即在图生图中使用触发词和 0.5~0.65 的重绘强度,为图像添加触发词对应的风格。
如果您无法生成与预览图相似的图像,请参照以下指南。
建议图像总分辨率(总分辨率=高度x宽度)大于 1024x1024 且 小于 1024x1024x1.5,否则生成的图像可能质量不高。此为经验法则,即生成图像的总分辨率应高于训练集图像的总分辨率,且同时低于训练集图像总分辨率的 1.5 倍,以防止模糊和畸变。例如,本模型在 1024x1024 总分辨率上训练,因此您最大可以生成 1024x1536(以 2:3 为例)分辨率的图像。
推荐使用 自然语言 + tags 的形式书写正面提示词。提高自然语言中的名词密度,避免使用抽象形容词,或用多个形容词叠加地修饰名词。另外,无需使用过多负面提示词。建议负面提示词数量不超过10个。
不进行“Clip Skip”操作,即 Clip Skip = 1。
采用 “dpmpp_2m” 采样器(sampler),搭配 “karras” 调度器(scheduler),该组合在 webui 里称为 DPM++ 2M Karras。在 7 CFG Scale 上采样 35 步以上。
仅需要使用模型本身,而不使用精炼器(Refiner)。
使用基底模型 vae 或 sdxl-vae。
使用附录部分提供的触发词以活用风格化。注意,从v0.5版本开始将支持部分质量提示词,如 best quality, masterpiece 等。使用它们将提高图像平均的美学质量(并不总是)。
2 注意事项
使用 SDXL 支持的 VAE 模型、文本嵌入(embeddings)模型和 Lora 模型。注意:sd-vae-ft-mse-original 不是支持 SDXL 的 vae;EasyNegative、badhandv4 等负面文本嵌入也不是支持 SDXL 的 embeddings。
生成图像时,强烈推荐使用模型专用的负面文本嵌入(下载参见 Suggested Resources 栏),因其为模型特制,故对模型几乎仅有正面效果。
由于初步训练,版本新增触发词将在当前版本效果相对较弱或不稳定。
3 实验
触发词所指向的风格能够相互融合而产生新的风格。
自 v0.5 版本开始,新增了质量提示词。
IV 训练参数
以 SDXL1.0 为底模,使用大约 2w 张自己标注的图像在 5e-6 学习率,循环次数为 1 的余弦调度器上训练了约 100 期得到模型 A。之后在 2e-7 学习率,其余参数相同的条件下,训练得到模型 B。将模型 A 与 B 混合后得到 AIDXLv0.1 模型。
V 对比基于 SD1.5 的 AID
2023/08/08:AIDXL 使用与 AIDv2.10 完全相同的训练集进行训练,但表现优于 AIDv2.10。AIDXL 更聪明,能做到很多以 SD1.5 为底模型无法做到的事。它还能很好地区分不同概念,学习图像细节,处理对 SD1.5 来说难于登天的构图,几近完美地学习旧版 AID 无法完全掌握的风格。总的来说,它绝对拥有比 SD1.5 更高的上限,我会继续更新 AIDXL。
Appendix / 附录
A. Trigger Words List / 触发词列表
v0.1 & v0.2: by 35s00, by 3meiji, by 5eyo, by 7nu, by 7thknights, by adenim, by agm, by ajimita, by akizero, by ame929, by anmi, by anteiru, by arutera, by ask, by atelier irrlicht, by bunbun, by caaaaarrot, by camu, by canking, by ccroquette, by chi4, by chicken utk, by chon, by cola, by cutesexyrobutts, by darumakarei, by dino, by dora, by dsmile9, by ei maestrl, by ekita kuro, by ekita xuan, by eku uekura, by fadingz, by fajyobore, by foomidori, by freng, by fuzichoco, by gesoking, by gomzi, by hachisan, by hakuhiru oeoe, by hamukukka, by haru, by hata, by hidulme, by hikinito0902, by hinaki, by hitoimim, by hitomio16, by hizumi, by homutan, by hotatenshi, by houk1se1, by hyatsu, by icecenya, by ichigo ame, by inoriac, by iromishiro, by iwzry, by jnthed, by joezunzun, by junsui0906, by karohroka, by kaya7hara, by kazari tayu, by killow, by kin, by kinta, by kishiyo, by kitada mo, by kkuni, by konya karasue, by kooork55, by kot rou020, by krenz, by kurige horse, by kuroume, by lalalalack, by lemoneco, by lm7, by lovelymelm, by lpmya, by mar takagi, by matcha, by matsukenmanga, by melowh, by menou, by midori xu, by mika pikazo, by misumigumi, by miv4t, by mochizukikei, by mogumo, by momoco, by momoku, by morikuraen, by mqkyrie, by muina, by munashichi, by muryou tada, by myaru, by myc0t0xin, by myung yi, by nack, by naji yanagida, by nanmo, by nardack, by narue, by nekojira, by netural, by nezukonezu32, by nico tine, by nikuzume, by nine, by nineo, by ninev, by niwa uxx, by nixeu, by noco, by noodle4cool, by nounoknown, by noyu, by oda non, by omutatsu, by onineko, by palow, by panp, by pikuson, by poharo, by poire, by potg, by pro-p, by qooo003, by rai hito, by rattan, by reiko, by rella, by rhtkd, by rin7914, by roitz, by ryuseilan, by saberiii, by sais, by sakiika, by samip, by sanosomeha, by say hana, by scottie0073, by senryoko, by serie niai, by seuhyo99, by shal-e, by shimanun, by shirabii, by shiraishi kanoya, by shiren, by shirentutu, by sho, by sia, by siki, by silver, by solipsist, by some1else45, by sonomura00, by sooon, by star furu, by starshadowmagic, by starzin07, by sui 0z0, by sul, by sushi0831, by suzukasuraimu, by taiki, by takumi bis, by teffish, by tidsean, by tira27, by tsukiho tsukioka, by tsvbvra, by ttosom, by tukumi bis, by uiiv, by ukiatsuya, by umaiyo puyoman, by void, by wait ar, by walzrj, by wanke, by whoisshe, by wlop, by xilmo, by yejji, by yogisya, by yohan, by yomu, by yoneyama mai, by yosk6000, by yumenouchi, by yun216, by yunikon147, by yunsang, by ziyun, by zumoti4
v0.3 adds: by akita hika, by asaikeu, by atdan, by bannou, by bison, by bodhi, by bonnie23, by cell, by chela77, by coco1758, by ebkim, by eichi, by electrophorus, by fungi, by gekidan, by glutton, by hews, by hirohorn, by hle, by hlymoriia, by icomochi, by iumu, by jeone0, by kana dfy, by kikinoki, by kumori ufo, by kurige, by lam, by liaowen, by limuli ceey, by lirseven, by maenoo, by magotsuki, by marusin, by mechari, by minncn, by modare, by r1zen, by rag ragko, by rannou, by rolua, by rurudo, by saclia, by sai izumi, by takunomi, by tedineon, by torino, by tororoshanyao, by tsubonari, by uuuzan, by yamanokami eaka, by zumizumi
v0.5 adds: by 3333382, by agoto, by am1m, by apoco, by aroa, by asea, by asets96, by asicah, by attas, by ayanon, by baihuahua, by ballpa cohi, by buzhijinfeng, by chai, by choyeon, by ciwu, by criin, by cupoi, by demizu posuka, by diyokama, by eyyy, by futamotu, by ganet, by greembang, by han, by hapu, by hcc33, by hoodxart, by hxxg, by ikky, by imlllsn, by japste, by jeanorgan, by jhcoon, by jiaming, by jirujiaru, by jjjsss, by jlt4n, by jojomaki, by jue, by jumbo, by kaedelic, by katann, by kieed, by kinokohime, by kirin, by kji rozo, by kmgrn, by kookie, by ksorede, by kuroduki, by kyusoukyu, by laza, by letco, by linfi muu, by lingli, by lizhiyan360, by luenar, by mamenomoto, by mashilemo, by mayf42, by mgdown, by miaopulu, by michihasu, by misaka12003, by mitsuki sanagi, by mizokooohmygod, by moguta, by moyu marginal, by mygom, by myless23, by nbrush19, by necomi, by nekoshoko, by nemn, by ninebell, by nininisama, by njer, by nmk, by nocopyrightgirl, by noir, by novelance, by ohanatoomoti, by omao, by qysthree, by rumoon, by ruoganzhua, by saltsaltzome, by sanmuyyb, by sannso, by saturn, by senju yosiyuki, by shant, by sheya, by shirataking, by siokazunoko, by siu, by skyjack, by sogawa, by ssr susu, by swd3e22, by swkld, by tansuan, by terite3lio, by timo, by uenomigi, by unfairr, by xianyuliangryo, by xiaoluoxl, by yaegasinan, by yarn, by yktmr10, by ymqqq, by yolanda, by yuichohui, by yumo012, by yutomaru, by yuzuyomogi
v0.5 adds (quality / traditional style): impasto, pseudo impasto, photorealistic, cel shading, flat color, realistic, oil painting, sketch, 3d, vivid color, perspective
B. Trigger Words Introduction / 触发词介绍
(Updating...)
写实风格(realistic):by wlop, by nixeu, by shal-e
卡通渲染(cel shading):by void, by 7thknights, by novelance, by ciwu, by homutan, by melowh
厚涂(impasto):by dino, by xilmo, by solipsist, by reiko, by some1else45, by noodle4cool, by unfairr
平涂(flat color):by uenomigi, by magotsuki, by 3333382, by eku uekura, by hakuhiru oeoe, by hamukukka, by haru, by hirohorn, by hizumi, by ichigo ame, by kooork55, by mitsuki sanagi, by qooo003, by nezukonezu32, by nico tine, by nocopyrightgirl, by sanosomeha, by sonomura00, by tsubonari, by tsvbvra, by tukumi bis, by ukiatsuya, by yktmr10, by yosk6000, by yutomaru