Sign In

The Fully-Automation of the Three-Stage Cropping Process for Character Datasets

10
The Fully-Automation of the Three-Stage Cropping Process for Character Datasets

Updated on 2023-8-29:

Another successful try is here:

The former one performs much better than the latter one in details.


Updated on 2023-8-28:

GOOD NEWS on training automation.

We have done an experiment, training the LoRA of character Ithea Myse Valgulious:

All the training takes about 15 epochs.

This is how this character actually looks like:

This is the result of the first LoRA (native training):

free.pngmaid.png

And this is the result of 2nd LoRA:

pattern_1.png

pattern_2.png

It's not hard to notice that in the second LoRA version, there's a significant improvement in the overall fidelity of characters, and details like eyes, ears, and hair have been better preserved. This indicates that the 3-stage-cropping method is indeed an effective approach to enhance model training quality, and we have successfully automated this method, producing a LoRA that potentially surpasses the quality of all previously trained LoRAs.

Perhaps, we are truly ready to start preparing a massive update for our existing models!


About the Full Automation of the Three-Stage Cropping Process for Character Datasets

In a previous article, we mentioned that the three-stage cropping method (full-body, upper-body, head close-up) for character images can effectively enhance the training quality of the model. This can be explained in technical terms as the repetition of focus areas in the dataset, effectively adding weight to these areas, thus resulting in better facial and upper-body details. Moreover, during manual LoRA training, it's often challenging to prepare hundreds of images for training (in reality, there are usually no more than twenty to thirty original images). Therefore, this method can also mitigate the potential issues caused by small datasets, such as overfitting.

To achieve this, we trained models for full-body and head detection on character images a long time ago. Recently, we completed the training of a model for upper-body detection in single-person images as well (online demo). Building on this, we integrated the three aforementioned object detection models, ultimately achieving full automation of the three-stage cropping process. Here's (an example) we provide, much like the image below:

As you can see, each character in the scene has been detected and cropped individually in full-body, upper-body, and head close-up segments. This process has been successfully implemented and will be used for processing new model datasets. For those of you who come across this and are familiar with Python, you can check out the code on this Hugging Face space and use it for local batch dataset processing.

Additionally, we've noticed that characters with fewer images might suffer from a higher proportion of low-quality images, which significantly impacts LoRA's training quality. Therefore, we're considering a new fully automated training method:

  • In the previous approach, we directly trained LoRA using 200 character images (filtered out sketches, monochrome images, line art, unrelated characters, etc.).

  • In subsequent automated LoRA training, we could use a relatively smaller number of images (e.g., 100 original images), which, after undergoing the three-stage cropping process, would result in a total of 300 images for training.

However, it's evident that some image websites don't support sorting by image quality, so we need a model that can evaluate image quality. Regarding this model, we've started data annotation, involving annotators with backgrounds in art or design-related fields to ensure the dataset's quality.

10

Comments

duskydreams's Avatar

I was looking for a tool like this for some time now. Is there a possibility that this could be extended to add things like lower_body shot or full_body without head(head out of frame) for more detail capture?

@duskydreams Headless upper-body cropping is easy. We have well-trained target detection models for heads and faces, so it only requires some mathematical calculations to crop the upper-body and head regions. Cropping the lower body is a complex task. The human body structure in anime images varies greatly, necessitating an approach similar to what we used for upper-body cropping. This involves manual data labeling and training models (in fact, to train the upper-body target detection model, we annotated over 3.5k images, and a similar number would likely be needed for the lower body).

Show more
e0972951006's Avatar

@narugo1992 hello Can I make a request Character -inoue orihime has its own merits Bleach op 11 image

Bleach of Bleach: Thousand-Year Blood War https://bleach.fandom.com/wiki/Orihime_Inoue

https://drive.google.com/drive/folders/1qsCFog3NYGBrA3EpGGnrGwr5s3KSI4TQ?usp=sharing

@e0972951006 Um, we think the dataset might still be too limited. Perhaps a better option would be to directly use anime video data to extract character images in bulk for training. We've recently developed this technique successfully. So, could you provide magnet of anime video resources that include this character (preferably without subtitles, logos, and in high definition, if possible, covering a whole season)? Also, we're curious about the character's frequency of appearance. Based on our recent experiments, characters that appear frequently in videos tend to have significantly better training quality, even approaching a near-perfect reproduction of the anime character's image.

Show more