Here are some of the latest discoveries and progress:
Previously, when using HCP-Diffusion for inference, the performance of models like MeinaMix was overly stylized due to the absence of VAE. However, after incorporating the VAE model from NAI, this phenomenon has significantly improved. An example using MeinaMix V10 can be found here: Link to Example
After running on a GPU cluster for several hours, around 50 of our LoRAs were generated using the AniDosMix base model to create new preview images (without VAE, of course). The overall results were not as expected. While the overall structure of clothing and characters improved compared to anything-v5, there were still significant issues with details like hands. Some character LoRAs even exhibited strange feature losses, a phenomenon not observed in any other base models we tested.
A bug in the training framework HCP-Diffusion has been identified. It's important to note that the LORA models used for generating images in HCP-Diffusion and in a1111's webui have different data formats. Therefore, a conversion is required before using them in the webui. This bug is located in the HCP to WebUI conversion module, implying that the LORA models previously uploaded for the WebUI may have varying degrees of anomalies. The actual impact of this anomaly is currently under further investigation by our team.
Presently, mainstream character LoRA training requires portraits, half-body, and full-body images. However, our previous training used only straightforward character portraits without further differentiation or refinement.
Mainstream character LoRA training generally involves 10-12 epochs (referenced from: Link to Source). Yet, due to the training dataset containing 200 images for many character LoRAs and a training step of 1500, the effective training only amounts to around 7.5 epochs. This means that the previously trained LoRA models are prone to varying degrees of underfitting. In fact, in our testing of over 50 character LoRA models combined with different base models, we found approximately 10% of character LoRAs exhibited varying degrees of underfitting. This phenomenon was particularly evident in characters with special features (like distinctively shaped horns), atypical colors (such as light hair between pink and brown, which is hard to precisely describe using wd14 tags), or complex hair colors (e.g., multi-colored or specific patterns of streaked hair).
Based on the first two points, we may focus on testing MeinaMix in the future. If the results are satisfactory, MeinaMix could be used as a subsequent base model for image generation (heartfelt thanks to user Meina for their contributions).
Regarding the third point, the bug has been fixed; however, the actual impact of this bug is yet to be confirmed.
Based on the fourth point, our technical capabilities currently enable anime image object detection, allowing us to separate characters from larger images. Similarly, we can perform targeted detections for head and facial regions using the online demo. This means that generating headshots and full-body images is relatively feasible, but half-body images remain a challenge. Current approaches being considered include:
Training a YOLOv8 model specialized for detecting half-body images (high accuracy, requires substantial labeled data and time for model refinement).
Exploring methods based on the position of the face within full-body images and using simple mathematical formulas to roughly crop half-body images (moderate accuracy, simple implementation).
Based on the fifth point, we have modified some of the logic in the automatic training code to ensure more comprehensive training and have begun experimenting with new training approaches.
Once the aforementioned issues are resolved, we will consider retraining existing models uploaded to civitai using the improved training process.
In conclusion, this testing round has indeed revealed some real-world issues. We are extremely grateful for the valuable feedback from civitai users, both positive and negative, which has greatly assisted us.
Please continue to follow our work in the future.