On August 20, 2023, user @Machi provided us with a series of high-quality suggestions regarding our work (original link: https://civitai.com/models/131265/yamato-kantai-collection?commentId=230171&modal=commentThread).
Our team conducted an in-depth discussion on this matter and decided to publicly release the changes we have made and will make in the form of an article:
1. In the data filtering process, images with a long edge length lower than 320 pixels will be removed to reduce the impact of low-quality images.
2. For future model releases, two trigger words will be included: the character itself and the character's associated tags. For example, for model mudrock/マドロック/泥岩 (Arknights), the original trigger word is
mudrock_arknights, after this update, the two trigger words are
horns, long_hair, red_eyes, bangs, pointy_ears, breasts, large_breasts, white_hair, cleavage, infection_monitor_\(arknights\), oripathy_lesion_\(arknights\), hair_ornament, jewelry, navel, very_long_hair, grey_hair. Previously, due to the absence of associated tags, some users encountered misuse. Additionally, for the already released 300+ models, associated tags have been added for all of them around the Prime Meridian time of 2023-08-21 23:39.
3. After extensive discussion within our team, we acknowledge that the issue of outfit accuracy for character models generated by a fully automated pipeline (referring to a process with 100% automation and zero human intervention) is a challenge that cannot be entirely resolved. As a result, for upcoming model releases, the following explanation will be added regarding the issue of outfit accuracy:
Our current training data is sourced from various image websites, and for a fully automated pipeline, it's challenging to accurately predict which official images a character possesses. Consequently, outfit generation relies on clustering based on labels from the training dataset in an attempt to achieve the best possible recreation. We will continue to address this issue and attempt optimization, but it remains a challenge that cannot be completely resolved. The accuracy of outfit recreation is also unlikely to match the level achieved by manually trained models. In fact, this model's greatest strengths lie in recreating the inherent characteristics of the characters themselves and its relatively strong generalization capabilities, owing to its larger dataset. As such, this model is well-suited for tasks such as changing outfits, posing characters, and, of course, generating NSFW images of characters!😉.
In our internal testing, for training scenarios that require outfit accuracy, introducing a small amount of human intervention (such as 10% manual processing + 90% automation) can lead to significant improvements in effectiveness, and in many cases, even achieve high-quality outfit recreation. Therefore, in the future, we plan to further integrate the existing pipeline engineering code and make it available as tools to assist more trainers in improving work efficiency based on their actual training needs.
4. Similarly, based on the understanding in point 3, we believe that more image examples should be introduced in the future to fully demonstrate the model's inherent high generalization capabilities (which is the true advantage of this model), rather than relying on a small number of cluster-based prompt images as before. We prepared some examples on huggingface, including multiple customes and NSFW patterns (NSFW images are hidden in links, the following urls are safe to see):
Here are the upcoming changes we are planning to make:
1. Regarding the issue with the base model, we are still waiting for the HCP-Diffusion framework to address the format conversion problem. However, based on our communication with the author, new base models will likely be available soon, along with higher-quality preview images.
2. Despite our understanding as mentioned above, we will continue to make every effort within our capabilities to improve outfit accuracy and have already begun working on some preliminary approaches.
Lastly, we extend our sincere gratitude to @Machi for the thoughtful and valuable suggestions. This has been a significant wake-up call for our team's work, providing clear direction for further improvement. Thank you very much!
Please continue to follow our work in the future.
We have just completed testing our trained LoRA on the following base models:
anything-v5 (the preview images the models are using)
They have demonstrated consistent performance on the aforementioned base models, effectively capturing the core features of characters while retaining sufficient distinctiveness and possessing ample generalization capacity, even for generating unclothed versions. Through thorough comparison across all base models in terms of style and considering various other factors, we have ultimately decided to opt for the AniDosMix style (specially thanks to the author DiaryOfSta), which leans towards the middle, as the base model for generating preview images of the upcoming models. Here is an example of amiya. This selection will officially take effect on August 24, 2023, at 23:59 Greenwich Mean Time. Stay tuned for further updates.
In addition, we believe it's necessary to outline the boundaries of our work, and this declaration will also be included in the description of the models we upload in the future. For the following groups, it is not recommended to use this model and we express regret:
Individuals who cannot tolerate any deviations from the original character design, even in the slightest detail.
Individuals who are facing the application scenarios with high demands for accuracy in recreating character outfits.
Individuals who cannot accept the potential randomness in AI-generated images based on the Stable Diffusion algorithm.
Individuals who are not comfortable with the fully automated process of training character models using LoRA, or those who believe that training character models must be done purely through manual operations to avoid disrespecting the characters.
Individuals who finds the generated image content offensive to their values.
Honestly, do you really have to let yourself be bothered by something that might be the poorest quality LoRA you've ever come across? Isn't that just a bit unnecessary, don't you think? :)