Automated anime character dataset for character LoRAs

I see many people set out, desiring to train their own anime character LoRA for their favorite character, yet have no idea how to get started. They read tutorials, watch videos, and eventually realize that creating a data set is the foundation of training a character LoRA. And that the creation process is extremely tedious! I've seen many people give up because of that, but I'm here to present an alternative idea: automate the shit out of it. This guide gives you a deep dive into using your favorite anime as the source for your character LoRA.

Requirements

You will need ffmpeg, the internet's favorite tool to automating anything dealing with video files, and python. A basic understanding of python, the programming language you will be using to automate tedious tasks, is recommended. It is not necessary to be an expert, but understanding enough to read and execute code snippets is fundamental.

And finally, you need the video files of your favorite anime series stored on your computer! It is essential that they do not have hardcoded subtitles, because that would interfere with training the LoRA. If you are not sure what hardcoded subtitles even means, try to change or disables subtitles in your video player. If you can, you're good to go~

Keyframes and Keyframes

To train a character LoRA, you need images. A lot of images. And you know what? Movies are nothing more than a sequence of images to evoke the sense of movement. That means you have a lot to work with to train a LoRA! The basic idea here is to extract a whole lot of these images, but before you do, it is good to understand a few technical aspects of anime and video encoding.

An anime animator basically draws two types of images: keyframes and in-betweens (tweens). Keyframes are the most important images in anime. They show a character drawn in the way that is recognizable to the viewer, and these images show the start of a movement, the end of a movement, and the most important positions in between them. The other images, in-betweens, are just there to gradually move from one position to the other. You're interested in obtaining the keyframes since they are the most accurate. For a deeper dive into this concept, watch Shirobako.

In video encoding, the process of making video files as small as possible, we talk about keyframes as well. These are not the same as anime keyframes. This makes matters a bit confusing. I'll really simplify the concept here, but in video encoding a keyframe is essentially a full image, and each subsequent frame is the change from the previous frame to this one. This is not that important to remember, just know that anime keyframes are not video keyframes.

With all that said, it turns out that video keyframes are pretty much always on anime keyframes! That means you can simply extract video keyframes and get mostly anime keyframes. But even with the result being mostly anime keyframes, that is more than enough to train a reliable LoRA. Okay... that was a lot of information! Now let's go~

Extracting Keyframes

The general idea here is to run through every anime episode and extract keyframes from all of the episodes, and write them as images to your hard disk. Depending on the anime length, the size of the output image folder can be quite large. For a 1080p anime series with 12 episodes, this easily reaches 30GB worth of images. Plan ahead and make sure you have enough disk space! Here is the python script to automate this process:

import os
import re
import subprocess

inputPath  = '/path/to/your/anime/files'
outputPath = '/path/to/your/anime/keyframe'

for inputName in os.listdir(inputPath):
    if re.search(r'\.(avi|mp4|mkv|ogm|webm)$', inputName):
        os.makedirs(outputPath, exist_ok=True)
        videoPath = os.path.join(inputPath, inputName)
        frameName = f'{inputName}-%06d.png'
        framePath = os.path.join(outputPath, frameName)
        subprocess.run(['ffmpeg', '-i', videoPath, '-vf', 'select=eq(pict_type\\,I)', '-vsync', 'vfr', framePath], check=True)

This iterates through your episodes and extracts keyframes. Make sure you change inputPath and outputPath to your real folders. If you're on Windows, you'll need to escape backslashes (C:\\Videos and not C:\Videos). If you have an error saying ffmpeg can't be found, make sure you add it to PATH. See How to Install FFmpeg on Windows.

Tagging Keyframes

Now that you have all the keyframes, you will need to tag them so you can easily find specific characters in the mess of tens of thousands of images. Anime models use Booru tags, but tagging images is tedious, time consuming, and extremely annoying. To automate this process, you can use the DeepDanbooru, an AI-powered image estimation system. Go download the pretrained AI model under Releases and extract it. Then run this in Command Prompt or Terminal:

pip install git+https://github.com/KichangKim/DeepDanbooru.git
pip install tensorflow tensorflow-io

What you're going to do now is iterate through the images and estimate the tags. These tags should be written into text files with the same name as the image (for example, Magical Girl Lyrical Nanoha - S04E01.mkv-000007.png will get a Magical Girl Lyrical Nanoha - S04E01.mkv-000007.txt). Here is the python script I came up with to automate this process:

import deepdanbooru
import os

deepdanbooruPath = '/path/to/your/danbooru/model'
inputPath = '/path/to/your/anime/keyframe'
model = None

for i, frameName in enumerate(os.listdir(inputPath)):
    # Iterate each frame.
    print(f'{i + 1}', end='\r')
    framePath = os.path.join(inputPath, frameName)
    textName = os.path.splitext(frameName)[0] + '.txt'
    textPath = os.path.join(inputPath, textName)

    if not os.path.exists(textPath):
        # Initialize the dependencies.
        if model is None:
            # Initialize the danbooru model.
            model = deepdanbooru.project.load_model_from_project(deepdanbooruPath, compile_model=False)
            modelShape = model.input_shape

            # Initialize the danbooru tags.
            tags = deepdanbooru.project.load_tags_from_project(deepdanbooruPath)
            tagsCharacter = deepdanbooru.data.load_tags(os.path.join(deepdanbooruPath, 'tags-character.txt'))
            tagsGeneral = deepdanbooru.data.load_tags(os.path.join(deepdanbooruPath, 'tags-general.txt'))

        # Initialize the image.
        image = deepdanbooru.data.load_image_for_evaluate(framePath, width=modelShape[2], height=modelShape[1])
        imageShape = image.shape
        image = image.reshape((1, imageShape[0], imageShape[1], imageShape[2]))
        
        # Evaluate the image.
        result = model.predict(image)[0]
        resultCharacter = []
        resultGeneral = []

        # Initialize the image tags.
        for j, tag in enumerate(tags):
            if result[j] < 0.5:
                continue
            elif tag in tagsCharacter:
                resultCharacter.append(tag)
            elif tag in tagsGeneral:
                resultGeneral.append(tag)

        # Write the tags.
        valueArray = set(resultCharacter + resultGeneral)
        value = ', '.join(item.replace('_', ' ') for item in valueArray)
        with open(textPath, 'w') as file: file.write(value)

The script should be understandable enough, but this essentially iterates through your keyframes, evaluates the image with DeepDanbooru, and writes the tags to a text file. Make sure you change deepdanbooruPath and inputPath to your real folders. The resulting tags will not be perfect, but they will be good enough for training. With this step out of the way, you have a pile of images that have been tagged. This will be the basis to select character images from!

Finding Characters

Now you're ready to find your character! At the moment of writing, the latest version of DeepDanbooru was trained in 2021. That means that characters that existed before that time can be found using their character tag. You can take a shortcut if this is the case and simply filter for the character tag, but for now, I'll explain under the assumption that your character is your seasonal waifu or something (but shame on you, a waifu is forever).

Identifying an anime character is easy. Appearances differ in all manner of things, but most characteristic are hair colors and eye colors. The idea here is to run through all the keyframes, look at the tags, and find something that matches your character. To find a female character with red hair, you might look for 1girl and red_hair, but exclude 1boy. If you spent some time with Booru tags, this will be familiar. Here is the python script I came up with to automate this process:

import deepdanbooru
import os
import shutil

activationTag = 'waifu'
deepdanbooruPath = '/path/to/your/danbooru/model'
inputPath = '/path/to/your/anime/keyframe'
outputPath = '/path/to/your/anime/character'

tagsCharacter = deepdanbooru.data.load_tags(os.path.join(deepdanbooruPath, 'tags-character.txt'))
tagsGeneral = deepdanbooru.data.load_tags(os.path.join(deepdanbooruPath, 'tags-general.txt'))

for i, textName in enumerate(os.listdir(inputPath)):
    # Iterate each frame.
    print(f'{i + 1}', end='\r')
    textPath = os.path.join(inputPath, textName)
    
    if textName.endswith('.txt'):
        # Initialize the image tags.
        with open(textPath, 'r') as file: value = file.read()
        valueArray = [x.strip().replace(' ', '_') for x in value.split(',')]       

        # Filter the image tags to character and general tags.
        frameCharacter = set(x for x in valueArray if x in tagsCharacter)
        frameGeneral = set(x for x in valueArray if x in tagsGeneral)

        # Find desired images.
        if ('1girl' in frameGeneral and
            '1boy' not in frameGeneral and
            'red_hair' in frameGeneral):
            # Initialize the image name.
            imageName = os.path.splitext(textName)[0] + '.png'
            imagePath = os.path.join(inputPath, imageName)

            # Initialize the image path.
            os.makedirs(outputPath, exist_ok=True)
            shutil.copyfile(imagePath, os.path.join(outputPath, imageName))

            # Write the tags.
            valueArray = [activationTag] + list(frameGeneral)
            value = ', '.join(item.replace('_', ' ') for item in valueArray)
            with open(os.path.join(outputPath, textName), 'w') as file: file.write(value)

Again, make sure you make sure you change deepdanbooruPath, inputPath and outputPath. Running this will find all the keyframes with matching tags and copy them to your outputPath. That will be the dataset you can train your LoRA on. You also want to change activationTag to the tag that will activate your character. I suggest you use the Booru tag for your character to make autocompletes work for your LoRA.

Filtering

I talked about finding a girl with red hair, and that is what the previous script is doing, but you are probably not looking for a girl with red hair! You will want to look for different tags. This is where some of that python knowledge will become even more useful. In the aforementioned script, you can find these three lines that determine which files are copied:

        if ('1girl' in frameGeneral and
            '1boy' not in frameGeneral and
            'red_hair' in frameGeneral):

Change this as you see fit. For example, finding a blonde girl:

        if ('1girl' in frameGeneral and
            '1boy' not in frameGeneral and
            'blonde_hair' in frameGeneral):

Perhaps there are multiple blonde girls in the show, and the one you want has red eyes:

        if ('1girl' in frameGeneral and
            '1boy' not in frameGeneral and
            'blonde_hair' in frameGeneral and
            'red_eyes' in frameGeneral):

Or maybe you're looking for that dashing husbando? Change 1girl and 1boy like this:

        if ('1boy' in frameGeneral and
            '1girl' not in frameGeneral and
            'blonde_hair' in frameGeneral):

Existing Characters

If you are looking for a character that existed back when the DeepDanbooru model you downloaded was trained, you might be able to the Booru character tag instead of filtering general tags as we have been doing so far. For example, the protagonist of the show Magical Girl Lyrical Nanoha has the takamachi_nanoha tag. I could try to look for her like this:

        if ('1girl' in frameGeneral and
            '1boy' not in frameGeneral and
            'takamachi_nanoha' in frameCharacter):

Whether this succeeds depends on whether your character already existed, and their popularity.

Pruning Images

You now have a folder containing the tagged keyframes of your character of choice. Now you need to do some manual labor to prune these images. You will have to inspect the images and remove bad ones and their associated tags from your dataset. Otherwise your LoRA training will be negatively influenced. This step is not particularly hard to do, but it will take about half an hour of your time. Strap in, use your favorite image viewer, and look for these signs of bad images:

Wrong Character: You constructed your dataset using automatically identified tags and copied images matching your preferred tags to your dataset. It is extremely likely there will be a few images depicting the wrong character in your dataset. These wrong characters are polluting your dataset, so remove them immediately.
Multiple Characters: DeepDanbooru is quite good at knowing when multiple characters are in the image and will tag them, but there are occasionally images that slip through, for example when showing an arm of another character at the side of the screen. An arm should not be too much of an issue, but remove these to err on the side of caution.
Duplicate Keyframes: When there are lengthy conversations happening, you will end up with multiple images of that same conversation. Look at the sequence of images and you will see that they are almost identical. This will also happen when switching between characters in a conversation. You need image diversity, so keep only one of each sequence.
Openings and Endings: You have probably extracted keyframes from an entire season of anime. This will produce duplicate images extracted from the openings and endings of each episode. These are a little harder to spot since they are not next to each other, but try to remember what you have seen. You need image diversity, so keep only one of each.
Tweens: Remember when you learned about the difference between keyframes and in-betweens, and that in-betweens are frames between the important keyframes? This is the moment where you should inspect your images and find malformed character faces, bad poses and motion blur. Pay attention to these and remove them from your dataset.
Bad Quality: Now we are down to the last point, which is quality! Not all keyframes are equal. You need a diverse dataset that contains close-ups and full-body shots of your character, but some images are just bad. This happens, for example, when a character is in the distance and is drawn poorly. Just look at each image and if your character doesn't feel like your character, remove the image from your dataset.

You can consider the first four points as objective and the last two points as subjective. Removing bad images and creating a diverse dataset is the goal here, but which images are in-betweens or bad quality? There is not a definitive answer. It is subjective and depends on the individual. Judge the images as you see fit. In the end, you are aiming to create a dataset of 20-50 images, so depending on the amount of images you have to work with, you can be a little stricter or a little more forgiving.

Conclusion

If you have been following along, you should have two folders. One containing all the tagged keyframes, and one containing the pruned character dataset. If you plan to train more character LoRAs from the same show, you can continue using the keyframe folder and just use different filters to select different characters. Questions or comments? Leave them below! Did you like this article? Then please give it a like~

Changelog

2023-06-27: Added information about pruning images in your dataset.