How to make a Style LoRA / 画风模型的制作
✅简体中文
V12052023001
一、前言
这篇文章主要介绍了我训练画风模型的方法,可能对新人不是很友好。我自认为还有许多不足之处,这也是为什么我迟迟没共享出来的原因,如果有好的建议欢迎补充在评论区。不过,最近模型质量已经是顶到了自家的天花板了,一直以来我都在努力从画风上让生成的图看上去不像AI画的,或者说更加像是有经验的画师画的。我个人也不是很有时间,指不定哪天就不更了,希望这篇文章能够为大家带来更多好的模型,如果按下文的方法制作画风模型的话,效果应该和下图差不多吧。
二、步骤
1. 准备训练集
训练集的好坏决定最终成品模型的质量上限。
收集好要用于训练画风的图片,数量尽量多,并备份一份用于重复裁剪。
把图片分类,一般分为四类:面部特写图、主视觉图、全身图以及要舍弃的图。如果没有相关的素材或者素材太少可以自己裁剪一些,裁剪了后如果像素太低了可以用AI模型(比如waifu2x)超分下。数量上不太在乎按怎样的比例,但主视觉图尽量多一些。图比较少的时候可以手动裁剪下重复利用。
面部特写图决定出图的人物与原图画风的相似性:
主视觉图决定出图的构图和最终成图效果,人物占图大概1/2,图片宽高比大概2:3:
全身图用于帮助拟合下半身的画风效果,实在没有也不要紧:
除了上述三种外:裸体图能加强肌肉线条的拟合效果,非裸体图能加强人物服饰的拟合效果,带背景的图可以加强背景的拟合效果,不带背景的图可以增强词条准确性,黑白图能增强线条的拟合效果和提高打光质量,etc。总之每种图都有其优点,按需合理利用,以满足需求。
要舍弃的图有:人物服饰过于花里胡哨的,人物与背景难以区分开的,人物姿势过于夸张的,与要训练的画风严重不符的,etc。
为分类的图片添加重复次数。按照下面几条去做的话效果会更好:
(主视觉图数量 * 重复次数) : (面部特写图 * 重复次数) ≈ 4 : 1,其他分类的图要比面部特写图还要少。
∑ (每个分类的图片数量 * 其对应的重复次数 * 2) ∈ [1000, 2000]。小于1000拟合效果不太好,大于2000存在过拟合的可能性。如果打标签时不为图片创建水平翻转的副本,则应该把乘2替换为乘4。
重复次数在满足第二条的同时,应尽量少,不然容易过拟合。
当差分图比较多时,建议再建一个分类,把所有差分图放进去,应将一组差分图视为一张图,并尽量保证每组图的数量尽量相等。比如我有一组两张的差分图和一组四张的差分图,我可以将两张的那组复制一份,这样就都有四张了。由于都有四张差分图,所以这个分类的重复次数应该设为主视觉图的1/4。
如果要训练的模型主要用于生成彩图,则黑白图不应比彩图多。
打开Stable Diffusion Web UI,调整反推阈值。对于画风模型而言,阈值一般调整到0.35,如果画风比较华丽,或者画面比较复杂,应该上调到0.5左右,以此提高词条准确率。
设置>反推设置>deepbooru: 评分阈值
对所有图片打标签,记得勾选上“创建水平翻转的副本”。
训练集准备完成。
2.训练模型
选定一个Checkpoint大模型用来训练LoRA,我常用的是 Anything-V5 PrtRE ,Anything的标签比较全。
调整训练参数,主要注意下面几个,其他的按需调整或者保持默认。epoch建议调整到8或者更多,有条件的可以调整到128。此外优化器我用的默认的AdamW8bit,Lion的学习率不太好控制。对自己的参数如果比较有自信的话可以按照自己的练,不必拘泥于我的参数。
save_every_n_epochs = 1 noise_offset = 0.1 keep_tokens = 3 lr = "1e-4" unet_lr = "1e-4" text_encoder_lr = "1e-4" optimizer_type = "AdamW8bit"
把训练集丢进去开练吧。训练完成后,挑选出loss率较低的1~4个epoch,下文称这些模型为待调试的模型。接下来准备测试和调整模型。
3.调试
首先,需要安装两个插件,叫 SuperMerger 和 LoRA Block Weight ,安装好后重启下Stable Diffusion。要是这一步报错了自行搜索排查问题,已经安装了的跳过这步。
https://github.com/hako-mikan/sd-webui-supermerger.git https://github.com/hako-mikan/sd-webui-lora-block-weight.git
将待调试的模型都跑一遍试试,挑出自认为最好的一个优先调试。假设我最后选择的模型名叫 Test0 ,暂时先放在一边留作备用。
X/Y/Z plot > Prompt S/R
接下来准备我们训练LoRA用的Checkpoint大模型的画风模型。我们可以用face_focus/cowboy_shot/full_body等作为标签,用大模型生成些图,作为训练集,训练得到大模型的画风。这是我自用的 AnythingV5RE 的画风模型:
对要调试的模型 Test0 使用 Checkpoint 的画风模型进行画风抵消,同时适当调整LoRA Block Weight。
Test0:1:GP1,AnythingV5RE_BASE:-1:GP2 # 对于画风模型,比较常用的参数是: # 01| 02| 03| 04| 05| 06| 07| 08| 09| 10| 11| 12| 13| 14| 15| 16| 17 GP1:1.0,0.6,0.9,0.9,0.9,0.6,0.5,0.4,0.4,0.6,0.8,1.0,1.1,1.1,1.1,1.1,1.1 GP2:0.3,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.2,0.2 # 如果画风不够显著,或者训练迭代步数较少: GP1:1.0,0.6,1.0,1.0,1.0,0.8,0.6,0.4,0.4,0.5,0.6,1.0,1.2,1.2,1.2,1.2,1.2 GP2:0.3,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.3,0.3,0.3,0.3,0.3
当画风欠拟合时(画风不够显著),应适当提高GP2 01 的权重,效果不佳则提高 GP1 13~17 的权重,同时提高GP2 13~17 的权重。
当模型肢体动作过拟合时(肢体畸形),应适当降低 GP1 01 的权重,如果效果不佳,再降低 GP1 06~11 的权重;
当手部过拟合时(手部畸形),应适当降低 GP1 04~07 的权重,同时降低 GP2 对应的权重;
当面部欠拟合时(面部特征不显著),应适当提高 GP1 06~11 的权重;
当服饰过拟合时(衣衫褴褛,容貌不整),应适当降低 GP1 03~05 的权重;
当画面整体较乱时(问题相当多),应先尝试降低 GP1 03~05 的权重,如果效果不佳,降低整体强度,比如:Test0:1:GP1 --> Test0:0.8:GP1。效果仍然不佳时,考虑降低或提高 GP2 的权重。
此外,不推荐在LoRA训练时使用Block Weight,我们的思路是让LoRA尽可能的学到内容,然后对学习到的内容人为进行选择性调整。
不断调整权重,直至模型效果达到满意为止。
模型制作完成。^_^
通过如上步骤,可以人为去除多数的junk data,人为修复过拟合或欠拟合的模型,让小小的LoRA装满有用的信息。
I. Introduction
This article primarily outlines the method I employed to train a style model, which may not be particularly user-friendly for newcomers. I acknowledge that there are many areas where I could improve, which is why I have been hesitant to share my work until now. I welcome constructive feedback in the comments section. Recently, the model's quality has reached its zenith within my current capabilities. My aim throughout this process has been to produce images that do not blatantly look like AI-generated art, but rather seem as if they were crafted by skilled artists. Due to personal time constraints, I may cease updates at any time. I hope this article will aid others in developing superior models. By following the described method, you should be able to create a style model yielding results similar to the example image provided below.
II. Steps
1. Prepare the training set
The training set's quality sets the ceiling for the final model's performance.
Gather a large number of images in the desired style for training, ensuring you have a backup set for repeated cropping.
Classify the images into four categories: close-up facial shots, main visual images, full-body shots, and images to discard. If you lack sufficient material or if the material is sparse, consider cropping some yourself. After cropping, if the resolution is too low, you can upscale using an AI model (e.g., waifu2x). The ratio of image types is not critical, but try to have more main visual images. When you have fewer images, manually crop them for reuse.
Close-up facial shots determine the similarity between the characters in the generated images and the original style:
Main visual images influence the composition and overall effect of the generated images, with characters occupying roughly half of the image and an aspect ratio of about 2:3:
Full-body shots help to fit the style of the lower half of the body, and it's not a problem if these are unavailable:
Beyond the aforementioned types: nude images enhance muscle line fitting, clothed images improve clothing fitting, images with backgrounds enhance background fitting, and images without backgrounds increase term accuracy. Black and white images can enhance line fitting and lighting quality, etc. Each type of image has its advantages, so use them wisely to meet your needs.
Images to discard include those with overly elaborate costumes, where the character blends into the background, with exaggerated poses, or that significantly deviate from the desired style, etc.
Add repetition counts to the categorized images. Following these guidelines will yield better results:
(Main visual images × Repetition count) : (Close-up facial images × Repetition count) ≈ 4 : 1, with other categories having fewer images than close-up facial images.
∑ (Number of images in each category × Corresponding repetition count × 2) should be between [1000, 2000]. Less than 1000 might lead to inadequate fitting, while more than 2000 increases the risk of overfitting. If you aren't creating horizontally flipped copies when labeling, replace multiplication by 2 with 4.
Minimize repetition counts while adhering to the second guideline to prevent overfitting.
When there are many differential images, it's advisable to create another category for them. Consider a group of differential images as one and ensure each group has approximately the same number of images. For instance, if I have a group of two differential images and another group of four, I could duplicate the group of two to match the four. Since each group now has four differential images, set the repetition count for this category to a quarter of that for the main visual images.
If the model is primarily used for generating colored images, black and white images should not outnumber colored ones.
Open Stable Diffusion Web UI and adjust the inverse threshold. For style models, the threshold is typically set to 0.35; if the style is more ornate or the scene more complex, adjust it to around 0.5 to improve term accuracy.
Settings > Inversion Settings > deepbooru: Score Threshold
Tag all images, remembering to check "Create horizontally flipped copies."
Your training set is now complete.
2. Training the Model
Choose a Checkpoint model for LoRA training; I frequently use Anything-V5 PrtRE, which has comprehensive tags.
Adjust the training parameters, focusing on the following few; the rest can be adjusted as needed or left default. Epochs are recommended to be set to 8 or higher; if feasible, up to 128. Additionally, I use the default AdamW8bit optimizer.
save_every_n_epochs = 1 noise_offset = 0.1 keep_tokens = 3 lr = "1e-4" unet_lr = "1e-4" text_encoder_lr = "1e-4" optimizer_type = "AdamW8bit"
Start training with the prepared dataset. After training, select 1-4 epochs with lower loss rates; refer to these models as the ones to be debugged. Next, prepare to test and adjust the model.
3. Debugging
Firstly, install two plugins named SuperMerger and LoRA Block Weight, and then restart Stable Diffusion. If you encounter errors during this step, troubleshoot them yourself; if already installed, skip this step.
https://github.com/hako-mikan/sd-webui-supermerger.git https://github.com/hako-mikan/sd-webui-lora-block-weight.git
Run all models to be debugged and choose the best one for priority debugging. Suppose the selected model is named Test0; set it aside for now.
X/Y/Z plot > Prompt S/R
Next, prepare the style model of the Checkpoint large model used for LoRA training. We can use tags like face_focus/cowboy_shot/full_body to generate images with the large model as part of the training set, thus obtaining the style of the large model. Here are the style models for my personal use of AnythingV5RE:
Use the style model of Checkpoint to counteract the style of the model Test0, while appropriately adjusting LoRA Block Weight.
Test0:1:GP1,AnythingV5RE_BASE:-1:GP2 # For style models, commonly used parameters are: # 01| 02| 03| 04| 05| 06| 07| 08| 09| 10| 11| 12| 13| 14| 15| 16| 17 GP1:1.0,0.6,0.9,0.9,0.9,0.6,0.5,0.4,0.4,0.6,0.8,1.0,1.1,1.1,1.1,1.1,1.1 GP2:0.3,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.2,0.2 # If the style isn't pronounced enough, or if the training iterations are few: GP1:1.0,0.6,1.0,1.0,1.0,0.8,0.6,0.4,0.4,0.5,0.6,1.0,1.2,1.2,1.2,1.2,1.2 GP2:0.3,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.3,0.3,0.3,0.3,0.3
Increase the weight of GP2 01 when the style is underfit (not pronounced enough); if ineffective, increase the weight of GP1 13~17, along with GP2 13~17.
Decrease the weight of GP1 01 when limbs are overfit (deformed); if ineffective, reduce the weight of GP1 06~11.
Reduce the weight of GP1 04~07 when hands are overfit (deformed), along with the corresponding GP2 weights.
Increase the weight of GP1 06~11 when facial features are underfit (not prominent).
Reduce the weight of GP1 03~05 when clothing is overfit (disheveled).
When the overall image is chaotic (multiple issues), first try reducing the weight of GP1 03~05; if ineffective, reduce the overall intensity, e.g., Test0:1:GP1 --> Test0:0.8:GP1. If still ineffective, consider adjusting GP2 weights.
It's not recommended to use Block Weight during LoRA training; our approach is to let LoRA learn as much as possible, then selectively adjust what has been learned.
Continue adjusting the weights until the model's performance is satisfactory.
The model creation is complete.^_^