Type | |
Stats | 340 |
Reviews | (66) |
Published | Sep 30, 2024 |
Base Model | |
Training | Steps: 33,496 Epochs: 16 |
Usage Tips | Clip Skip: 2 |
Hash | AutoV2 C6CA90FD31 |
基础信息:
实验性的Ft模型,在Animagine3.1的预训练模型上训练得来,旨在提升Flat类画风细节最终成图艺术效果进行的微调。训练集来自互联网≈10w张数据,由人工进行筛选、美学评估和艺术分类,最终筛选10049张构图较完整、人体结构较准确、画风较精良的图像作为训练集。
(英为GPT翻译 The English portion was translated by GPT)
Basic Information:
This is an experimental fine-tuned (Ft) model, trained on the pre-trained model Animagine 3.1. The goal of this fine-tuning is to enhance the artistic details of flat-style images. The training dataset, collected from the internet, consists of approximately 100,000 images, which were manually screened, evaluated for aesthetics, and categorized by art style. In the end, 10,049 images that featured relatively complete compositions, accurate human anatomy, and high-quality art styles were selected for the training set.
预训练:
Animagine3.1是数据量、概念较完整、美学分类较精确的优秀模型,但其体系美学分类在同一评分层级中艺术风格上亦有概率存在参差,尤其如Flat与Scribble类往往易成为相反的向量,但处同一艺术层级而相互带入对方的干扰,尤其表现在头发、皮肤等细节处。
Pre-training:
Animagine 3.1 is an excellent model with a large dataset, well-defined concepts, and precise aesthetic classification. However, within the same aesthetic rating tier, there can be some inconsistency in art styles. For instance, Flat and Scribble styles often become opposing vectors, and their interference with each other can be particularly evident in details such as hair and skin.
Quality Modifier Score Criterion
masterpiece > 95%
best quality > 85% & ≤ 95%
great quality > 75% & ≤ 85%
good quality > 50% & ≤ 75%
normal quality > 25% & ≤ 50%
low quality > 10% & ≤ 25%
worst quality ≤ 10%
同时由于普遍可收集的优秀训练集资源数量始终有限而必然存在训练集趋同性,直接进行训练易在未重分类完成时发生过拟合。因此我们希望先通过预训练将A31所包含的画风事先尽量拉至“较平均水平”。事实采用了约3000数量不同的动漫原图进行训练,将画风先调至普遍更接近平涂的艺术风格。
Due to the limited availability of high-quality training resources, there is always a tendency for training sets to converge. Direct training risks overfitting if reclassification is not completed. Therefore, we aimed to pre-train by “pulling back” the art styles in Animagine 3.1 to a more balanced starting point. Approximately 3,000 anime images were used to adjust the style closer to flat coloring before further training.
数据分类:
分类规则(提示词): 画面整洁程度 | 构图质量 | 艺术风格,而并未直接沿用A31的美学分类标准,但A31提示词仍然有效,在调用时更多会展现A31原始画风。
Data Classification:
The classification criteria (prompt words) include cleanliness of the image, composition quality, and art style, instead of directly using Animagine 3.1’s aesthetic classification standards. However, Animagine 3.1’s prompts remain valid and will primarily display its original art style when called.
数据分类:
分类规则(提示词): 画面整洁程度 | 构图质量 | 艺术风格,而并未直接沿用A31的美学分类标准,但A31提示词仍然有效,在调用时更多会展现A31原始画风。
画面整洁程度:
1.extremely_ clean_colorstyle
2.very_clean_colorstyle
3.medium_clean_colorstyle
4.slightly_scribble_colorstyle
5.very_scribble_colorstyle
6.extremely_scribble_colorstyle
字面含义。
构图质量:
1.excellent composition
2.good_composition
3.common_composition
4.bad_composition
很愚蠢的尝试,单纯由人工进行的主观构图水平的分类,初衷是将全图构图较差如人物占画面过大比重等的图像进行重分类,实测结果在该量级的训练下基本无效,该标注方法会在今后的训练中优化或舍弃。
艺术风格:
如sample所示的一些艺术家风格,值得注意的是此艺术家Tag所涵盖画风并不限于本人的作品集合,而是作为有代表性的一类画风集合,在训练集分类时引用了部分艺术家名称以方便分类,其中包含有其它画风接近的其它艺术家或美术资源的训练集。
其它艺术类别与Nai中Tag类似,这里不在sample中一一展示。
示例:1girl, ganyu /(genshin impact/), very_clean_colorstyle, good_composition, artist_ShinyColors, ____
Cleanliness of the image:
extremely_clean_colorstyle
very_clean_colorstyle
medium_clean_colorstyle
slightly_scribble_colorstyle
very_scribble_colorstyle
extremely_scribble_colorstyle
These are self-explanatory terms.
Composition quality:
excellent composition
good composition
common composition
bad composition
This was a naive attempt to classify composition quality based solely on subjective human judgment. The initial goal was to reclassify images with poor composition, such as those with characters taking up too much of the frame. In practice, this method proved ineffective at this training scale and will be optimized or discarded in future training.
Art style:
As shown by some artist styles in the samples, it’s worth noting that the artist tags cover more than just the artist’s own body of work. These tags represent a broader collection of similar art styles. When categorizing the training set, some artist names were used for easier classification, but these tags also include works from other artists with similar styles or art resources.
Other art categories and tags are similar to those used in Nai and are not displayed in the sample here.
Example: 1girl, ganyu (genshin impact), very_clean_colorstyle, good_composition, artist_ShinyColors
模型人物继承于Animagine3.1,由A31直出的人物均可以直接进行生成。同样,A31中训练样本和步数较少、复现效果较差的冷门人物等同样并不能在该模型中有较好效果。
Characters in this model are inherited from Animagine 3.1, and any character directly generated from A31 can be generated here as well. Similarly, lesser-known characters with fewer training samples and steps in A31 will not have significantly improved generation effects in this model.
训练参数:
具体微调训练参数如下:
Training Parameters:
The detailed fine-tuning training parameters are as follows:
Num Train Images: 10049
Batch Size: 6
Epoches: 20
Epoch: 20
Total Steps: 33496
Optimizer: adaFactor
Unet lr: 6e-6
Text lr: 4e-6
lr_scheduler: constant_with_warmup
Resolution: 1024x1024
Mixed Precision: BF16
局限性:
1.该Demo训练集选取主要针对动漫cg及平涂类画风,厚涂、油画、等艺术效果表现能力有限。训练步数有限,继承A31的优缺点。
2.预训练及微调训练集标注使用 wd-swinv2-tagger-v3 进行标注,因识别能力有限而仍存在较多错误标注,造成对原本底模正确概念一定程度污染。尤其体现在如水龙头易识别为自行车或者其它交通工具ORZ,球网、电线等易混淆概念的错标,造成背景生成时一定程度的不良影响。也同样体现在生成图像的画面中概念较多、细节较多时细节易发生错误。
3.不会画手。
Limitations:
This demo’s training set primarily focuses on anime CG and flat coloring styles, with limited ability to handle thick painting, oil painting, or other artistic effects. The limited training steps inherit both the strengths and weaknesses of A31.
The pre-training and fine-tuning datasets were labeled using wd-swinv2-tagger-v3, which has limited recognition capability, leading to many mislabeling issues. This has somewhat polluted the original base model's correct concepts. For example, a faucet is often misidentified as a bicycle or other transportation tools. Concepts like nets or wires are easily confused, causing some undesirable effects in background generation. Similarly, when images contain many concepts and details, errors in those details are more likely to occur.
It cannot draw hands properly.