Sign In

Fictional Characters Among Other Things: A LoRA Training Guide

Fictional Characters Among Other Things: A LoRA Training Guide

Patch Notes (06/30/2024):

  • Changed the article title from Characters, Clothing, Poses, Among Other Things: A Guide. Might change back if need be.

  • Fixed a link regarding ReFocus Cleanly.

In With the New Update (05/14/2024):

  • Attachments have been updated. To use the new settings, you must switch to the dev branch of Derrian's trainer. This is done by entering into the command line git clone -b dev https://github.com/derrian-distro/LoRA_Easy_Training_Scripts.git. So what has been updated?

    • PDXL settings: The optimizer and scheduler are now set to CAME and REX. The learning rates have been changed to unet 1e-4 and tenc 1e-6. These settings are VRAM efficient and can train faster.

    • DoRA settings: A new TOML has been added for PDXL training. If you want to know how to train DoRAs, these are the settings to start with.

  • Removed the Additional Tools section from the article. You can still check it out in the attachments.

  • I use chaiNNer to upscale my images now thanks to PotatCat's guide.

  • The Dataset Tag Editor extension section has been replaced with Particle1904's Dataset Processor.

  • Slight adjustments to the Preparing the Oven section. For the full settings for training XL LoRAs, use the provided attachments.

  • Removed the glossary section for LoRAs, LoCons, and LoHas.

  • Slight wording adjustments to everything else.

Preface

If you want to dedicate only 30-40 minutes of your time to making a simple character LoRA, I highly recommend checking out Holostrawberry's guide. It is very straightforward, but if you're interested in looking into almost everything that I do and you're willing to set aside more time, continue reading. This guide piggybacks a bit off of his guide and Colab Notebooks, but you should still be able to get the gist about everything else I am about to cover.

The cover image uses FallenIncursio's Pepe Silvia Meme if you're wondering.

Introduction

Here's what this article currently covers:

  • Pre-requisites

  • Basic Character LoRA

  • Concepts, Styles, Poses, and Outfits

  • Multiple Concepts

  • Training Models Using AI Generated Images

  • Prodigy Optimizer

  • Training a LoRA in Pony Diffusion

  • What is a DoRA?

Use chapters to the right of the page to navigate.

Baby Steps: Your First Character

We won't start off too ambitious here, so let's start out simple: baking your first character.

Here's a list of things that I use (or have used or recommend checking out).

"Woah, that's a lot of bullet points don't you think?"

Don't worry about those, install the following first: the Grabber, dupeGuru, Dataset Processor by Particle1904, and Derrian's LoRA Easy Training Scripts if you're training locally. You'll be using an easy-to-use tool to download images off of sites like Gelbooru and Rule34. Then you'll be using dupeGuru to remove any duplicate images that may negatively impact your training, and finally send the remainder of your images straight to the dataset tag editor.

Grabber

I use Gelbooru to download the images. You're familiar with booru tags, right? Hope your knowledge of completely nude and arms behind head carries you into this next section.

Got a character in mind? Great! Let's say I'll be training on a completely new character this website has never seen before, Kafka from Honkai: Star Rail!

If you want to use a different site other than Gelbooru, click on the Sources button located at the bottom of the window. It's best that you leave one site checked.

So what should you put that in search bar at the top? For me, I'll type solo -rating:explicit -rating:questionable kafka_(honkai:_star_rail). You don't have to add -rating:questionable, but for me, I want the characters to wear some damn clothes. You may also choose to remove solo if you don't mind setting aside extra time to crop things out. This then leaves -rating:explicit, should you remove it? Well, it depends entirely on you, but for me, I'll leave it. And just because, I'll throw in a shirt tag.

Well this looks promising: 259 results. Hit that Get all button. Switch over to your Downloads tab.

This tab is where you can keep track on what you're planning to download. Before we download, you see the bottom left? Choose a folder where you want your downloaded images to go. Then right click an item on the Downloads list and hit Download.

dupeGuru

All done? Great, let's switch over to dupeGuru.

Once it's opened, you're going to add a folder location for it scan through your images. Click the + symbol at the bottom left and click Add Folder. Choose the folder where your images reside in and then click Select Folder. If you want to determine the hardness at which dupeGuru detects duplicate images, then go to View > Options or hit Ctrl + P. Set your filter hardness there at the top of the Options window then hit OK. Once you're done with that, select the Application mode at the top to Picture. Hit Scan. When it's finished going through your images, I usually Mark all then delete (Ctrl+A and Ctrl+D if you wanna speedrun).

Note that this is not guaranteed to catch every duplicate image, so you'll still have to look through your dataset.

Curating

Inspect the rest of your images and see if there might have been any duplicate images dupeGuru might've missed and get rid of any bad quality images that might degrade the output of your training. Fortunately for me, Kafka is filled with plenty of good quality images, so I'll be selecting at most 100! Try going for other angles like from side and from behind, so that Stable Diffusion doesn't have to guess what they look like at those angles.

If you have images you really want to use, but found yourself in these cases:

  • Multiple Views of One Character

  • Unrelated Character(s)

  • Cropped Torso/Legs/etc.

Then you'll have to do a bit of image manipulation. Use any image editing application of your choice.

As new settings and technology start to pick up more detail, it's best that you try to get rid of signatures and watermarks in your dataset as well.

Improving the Quality of Your Output Model

If you think your dataset is good enough and you're not planning on training at a resolution greater than 512, then skip this step.

There is an article made by @PotatCat that upscales and changes the contrast of images (useful for anime screencaps). I recommend reading it, it's easy.

(In a WebUI of your choice) If you're screencapping an old and blurry anime, you should download ReFocus Cleanly and place it into the ESRGAN folder. Make sure to set Resize to 1 when using it.

Dataset Helper

Let's start auto-tagging your dataset. In the Dataset Helper, go to Generate Tags. Point the input and output folder to your dataset folder. Set the auto tagger model to WDv3 and threshold for predictions to 0.25. Make sure the first option is checked. Click the Generate tags button at the bottom.


When it's done, head on over to the Process Tags screen.

In the "Tags to add" field, type a trigger word here.

In the "Tags to remove" field, this is where you will be typing out tags like hair length, hairstyle, and eye color. Don't know what you can remove? Click Calculate frequency. Pruned tags will be absorbed into the trigger word.

I recommend checking Would you like to rename images and their .txt files to crescent order? Hit the Process tags button at the bottom when you're done.


I strongly suggest manually editing your captions, so head on over to Tag/Caption Editor. This is where you'll be adding or removing tags. The application will tell you how this page works, so check it out. Use the danbooru wiki to read up on some tags, you might learn a thing or two.

Be descriptive and unique with your tags to minimize bleeding in your model!

Preparing the Oven

Settings for 1.5 and PDXL are included in the attachments.

I use NovelAI (or equivalently, nai or animefull) as the base training model. I do not know whether training on unpruned or pruned makes a difference.

Here are my settings:

  • General Args

    • Model

      • Base Model: NAI

      • External VAE: None

    • Resolution: 768

      • Set it to 512 to save VRAM and decrease training times.

    • Gradient Checkpointing: False

      • Set it to True to save VRAM and increase training times by ~30%.

    • Gradient Accumulation: 1

    • Clip Skip: 2

    • Batch Size: 2

    • Training Precision: bf16

      • Set it to fp16 if your hardware doesn't support bf16.

    • Max Token Length: 225

    • Memory Optimization: SDPA

    • Cache Latents: True, To Disk: True

  • Network Args

    • LoRA Type: LoRA

    • Network Dim/Alpha: 16/8

  • Optimizer Args

    • Main Args

      • Optimizer Type: AdamW8bit

        • Set it to AdamW if you have an old graphics card that doesn't support AdamW8bit.

      • LR Scheduler: cosine with restarts

      • Loss Type: L2

      • Learning Rate: 5e-4 or 0.0005

      • Unet Learning Rate: 5e-4 or 0.0005

      • TE Learning Rate: 1e-4 or 0.0001

        • This should be a fifth of your Unet LR.

      • MIN SNR Gamma: 5

      • Warmup Ratio: 0.05

        • This is used if your LR scheduler is set to constant with warmup.

    • Optional Args:

      • weight_decay=0.1

      • betas=[0.9,0.99]

  • Bucket Args

    • Enable it

    • Maximum Bucket Resolution: 1024

    • Bucket Resolution Steps: 64

If you have a trigger word, make sure that Keep Tokens is set to at least 1. Enabling Shuffle Captions will supposedly make your LoRA less rigid.

If you're a Colab user, you should have a folder called Loras in your Google Drive if you're going to use Organize by project. Make sure your folder structure looks like this: <your folder name> -> dataset where dataset is a subfolder that contains your images and text documents. Once you checked that your structure's correct, upload it to Google Drive inside the Loras folder.

Now while it's uploading, let's go over how many repeats and epochs you should use. First, how many images do you have? I did say I would choose up to 100 images for my dataset, so let's go over Holostrawberry's reference table.

20 images × 10 repeats × 10 epochs ÷ 2 batch size = 1000 steps

100 images × 3 repeats × 10 epochs ÷ 2 batch size = 1500 steps

400 images × 1 repeat × 10 epochs ÷ 2 batch size = 2000 steps

1000 images × 1 repeat × 10 epochs ÷ 3 batch size = 3300 steps

According to this table, I should set my repeats to 3 and epochs to 10, so that's what I'll be doing. After that, all I really need to do is set the project_name in Google Colab to whatever I named my project folder that's sitting in my Drive. In my case, it's hsr_kafka.

For those training locally, this is your folder naming scheme: repeats_projectname where you'll be replacing repeats with the number of repeats and projectname with whatever you want it to be.

  • If your GPU has more than 9 or 10 GB of VRAM, you can train at a resolution of 768 with batch size 2, XFormers enabled. Try not to do anything else that uses up more resources while the trainer is busy. Otherwise, reduce those settings.

Great, I think that settles it, let's run it and let the trainer handle the rest.


Is it Ready?

Is your LoRA finished baking? You can choose to either download a few of your latest epochs or all of them. Either way, you'll be testing to see if your LoRA works.

Head back to Stable Diffusion and start typing your prompt out. For example,

<lora:hsr_kafka-10:1.0>, solo, 1girl, kafka, sunglasses, eyewear on head, jacket, white shirt, pantyhose

Then enable the script, "X/Y/Z plot." Your X type will be Prompt S/R, which will basically search for the first thing in your prompt and replace it with whatever you tell it to replace. In X values, you'll type something like -10, -09, -08, -07. What this will do is find the first -10in your prompt and replace it with -09, -08, -07. Then hit Generate and find out which epoch works best for you.

Once you're done choosing your best epoch, you'll be testing which weight works, so for your X values, type something like 1.0>, 0.9>, 0.8>, 0.7>, 0.6>, 0.5>. Hit Generate again.

Your LoRA should ideally work at weight 1.0, but it's okay if it works best around 0.8 since this is your first time after all. Training a LoRA is an experimental game, so you'll be messing around with tagging and changing settings most of the time.

Concepts, Styles, Poses, and Outfits

Now that you know the basics of training a character LoRA, what if you want to train a concept, a style, a pose, and/or an outfit? Look for consistency and provide proper tagging.

For concepts: Add an activation tag. You may choose to prune any related tags or leave them in. Here's an example. Notice that it only takes one tag to prompt a character holding the Grimace Shake. One element that remained consistent is the shake that appears in each image of the dataset. I've pruned tags such as holding and cup.

For styles: I prefer not adding an activation tag, so that all the user needs to do is call the model and prompt away. Just let the autotagger do its work then immediately save & exit. Here's an example. Make sure there's style consistency across all images. You'll want to raise up the epochs and test each one. Lower unet LR if necessary. You may lower tenc LR or set it to zero. Set Keep Tokens to 0 if you're not using an activation token.

For poses: Add an activation tag. You may choose to prune any related tags or leave them in. Here's an example. In the dataset, there was consistency of random characters putting their index fingers together.

For outfits: Add an activation tag. You may choose to prune any related tags or leave them in. Here's an example. I've pruned tags such as cross and thighhighs.

There are some tags worth leaving in so that certain concepts are learned properly.

Multiple Concepts

Sorting

This part will cover how to train a singular character who wears multiple outfits. You can apply the general idea of this method to multiple characters and concepts.

So you have an assortment of images. You're going to want to organize those images into separate folders that each represent a unique outfit.

Now let's say you're left with 4 folders with the following number of images:

  • Outfit #1: 23 images

  • Outfit #2: 49 images

  • Outfit #3: 100 images

  • Outfit #4: 79 images

Let's make things easier. Delete 3 images in the folder for outfit #1, 16 images in #2, and 29 images in #4. I'll elaborate on this later.

Tagging

Now you'll associate each outfit with their own activation tag. Use Zeta from Granblue Fantasy as a guide. These are my triggers for each outfit:

  • zetadef

  • zetasummer

  • zetadark

  • zetahalloween

Of course, I've pruned hair color, hair length, and eye color, but I've also left out hair style and clothing tags. You can choose to prune these and bake them into each activation tag.

Training Settings

Remember when I told you to delete a specific number of images in that hypothetical dataset of yours? What you'll be doing is trying to train each outfit equally, despite the differences in their image count. Here are the updated folders:

  • Outfit #1: 20 images

  • Outfit #2: 33 images

  • Outfit #3: 100 images

  • Outfit #4: 50 images

If I were Holostrawberry, he'd suggest using the following repeats for each folder:

  • Outfit #1: 5 repeats

  • Outfit #2: 3 repeats

  • Outfit #3: 1 repeat

  • Outfit #4: 2 repeats

If you're using his Colab notebook, head down to the section where it says, "Multiple folders in dataset." Here's what your cell should look like:

custom_dataset = """
[[datasets]]

[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit1"
num_repeats = 5

[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit2"
num_repeats = 3

[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit3"
num_repeats = 1

[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit4"
num_repeats = 2

"""

Let's do some math: (20 × 5) + (33 × 3) + (100 × 1) + (50 × 2) = 399

We'll label this number as T. Now let's determine how many epochs we should get. This is what I usually turn to:

200 T × 17 epochs ÷ 2 batch size = 1700 steps

300 T × 12 epochs ÷ 2 batch size = 1800 steps

400 T × 10 epochs ÷ 2 batch size = 2000 steps

So our T is closest to the last row, so we'll run with 10 epochs.


If you're using Derrian's LoRA Easy Training Scripts, you should see something like this:

This is how you'll control the repeats for each folder. You can prepend a number to each folder's name to automatically input the number of repeats.

With that out of the way, start the trainer!

Using AI Generated Images for Training

Can it be done? Yes, absolutely, for sure. We even created an anime mascot for this website!

If you're working to better your models, you should choose your best generations (i.e., the most accurate representation of your model). Inspect your images carefully, Stable Diffusion alone is already bad enough with hands. Don't make your next generations worse if you're not taking care of your dataset.

Prodigy

The Prodigy optimizer is a supposed successor to the DAdaptation optimizers and it's been out for quite some time. Two of my early uses of this was Wendy (Herrscher of Wind) and Princess Bullet Bill. It is aggressive in the way it learns and it's recommended for small datasets (I'll typically throw around 20 to 30 images at it, give or take). The optimizer is great but not terribly amazing by any means since it seems to mess up some details like tattoos. If you want to mess around with it, here are the settings you would modify:

  • optimizer: Prodigy

  • learning_rate: 0.5 or 1 (unet lr and tenc lr are the same)

  • network_dim/network_alpha: 32/32 or 16/16

  • lr_scheduler: constant_with_warmup

  • lr_warmup_ratio: 0.05

  • optimizer_args: "decouple=True, weight_decay=0.01, betas=[0.9,0.999], d_coef=2, use_bias_correction=True, safeguard_warmup=True"

For repeats, try shooting for 100 steps. For example, if I have 20 images, I would go with 5 repeats. For epochs, just set it to 20. While it's training, you'll see loss like here:

When it's nearly done training, look for the model with its loss at its lowest point. It'll typically be around 800 steps or so.

EDIT (May 14th, 2024): I've noticed some creators are using Prodigy with datasets over the recommended size at high repeats. The model will be burnt to a crisp. You don't have to do this.

Training a LoRA in Pony Diffusion

Pony Diffusion V6 XL (PDXL) is a finetune over the Base XL 1.0 model. It's heavy but it knows more characters and concepts, though it's not great with backgrounds. You'll have to be more descriptive with your tagging (e.g., blue jacket, red pants, black gloves).

Preparing the Oven

NOTE TO TRAINERS USING DERRIAN'S: You must use the dev branch of Derrian's LoRA Easy Training Scripts. This can be done by typing into the command line git clone -b dev https://github.com/derrian-distro/LoRA_Easy_Training_Scripts.git. Here are the settings that work with 12 GB of VRAM:

  • General Args

    • Model

      • Base Model: Pony Diffusion V6 XL

      • External VAE: SDXL Vae

      • SDXL Based: True

      • No Half Vae: True

      • Full BF16: True

    • Resolution: 1024

    • Gradient Checkpointing: True

    • Gradient Accumulation: 1

    • Batch Size: 2

    • Max Token Length: 225

    • Memory Optimization: SDPA

    • Cache Latents: True, To Disk: True

  • Network Args

    • LoRA Type: LoRA

    • Network Dim/Alpha: 8/4

  • Optimizer Args

    • Main Args

      • Optimizer Type: CAME

      • LR Scheduler: REX

      • Loss Type: L2

      • Learning Rate: 1e-4 or 0.0001

      • Minimum Learning Rate: 1e-6 or 0.000001

      • Unet Learning Rate: 1e-4 or 0.0001

      • TE Learning Rate: 1e-6 or 0.000001

      • MIN SNR Gamma: 5 / 8

      • Warmup Ratio: 0.05

    • Optional Args:

      • weight_decay=0.08

  • Bucket Args

    • Enable it

    • Maximum Bucket Resolution: 2048

    • Bucket Resolution Steps: 64

Repeats & Epochs

Here is my table for setting repeats and epochs:

20 images × 2 repeats × 10 epochs ÷ 2 batch size = 200 steps

40 images × 2 repeats × 6 epochs ÷ 2 batch size = 240 steps

60 images × 2 repeats × 5 epochs ÷ 2 batch size = 300 steps

100 images × 2 repeats × 4 epochs ÷ 2 batch size = 400 steps

200 images × 2 repeats × 3 epochs ÷ 2 batch size = 600 steps

600 images × 2 repeats × 2 epochs ÷ 2 batch size = 1200 steps

Never hurts to add an extra epoch. You may need to lower unet if your dataset becomes bigger than listed here.

What is a DoRA?

Known as Weight-Decomposed Low-Rank Adaptation, it's a type of LoRA with additional trainable parameters, direction, and magnitude, which makes them come close to native finetuning. Training a DoRA trades speed for greater detail and accuracy. This may be in your best interest if you're training a style or a concept.

You can check out their repo and paper here: https://github.com/catid/dora

Preparing the Oven

Take the above PDXL settings and make changes to the following:

  • Network Args

    • LoRA Type: LoCon (LyCORIS)

    • Conv Dim/Alpha: 8/4

    • DoRA: True

  • Optimizer Args

    • Main Args

      • Learning Rate: 5e-5 or 0.00005

      • Unet Learning Rate: 5e-5 or 0.00005

(Old) Final Thoughts

Hello, if you made it here, then thank you for taking the time to read the article. I did promise making this article to share you everything that I've done since that announcement. Though, I did rush some things up until the end, so this article is not completely final just yet. If there's any questions and criticisms you have, please let me know! If there's something that you think can be done more efficiently, please let me know! Treat this as a starting point to your way of training LoRAs. Not everything here is perfect and no method in training LoRAs is ever perfect.

And remember, making LoRAs is an experimentation game.

1.3k

Comments