Sign In

Characters, Clothing, Poses, Among Other Things: A Guide for SD1.5

Characters, Clothing, Poses, Among Other Things: A Guide for SD1.5

The Belated New Years Update (01/03/2024):

Update (01/09/2024):

  • Made a small change to the Python code.

  • Slight wording adjustments.

  • Attached a .zip with two TOMLs in it so you don't have to input the settings every time. You'll still need to change some settings though.

Preface

If you want to dedicate only 30-40 minutes of your time to making a simple character LoRA, I highly recommend checking out Holostrawberry's guide. It is very straightforward, but if you're interested in looking into almost everything that I do and you're willing to set aside more time, continue reading. This guide piggybacks a bit off of his guide and Colab Notebooks, but you should still be able to get the gist about everything else I am about to cover.

The cover image uses FallenIncursio's Pepe Silvia Meme if you're wondering.

Introduction

Here's what this article currently covers:

  • Pre-requisites

  • Basic Character LoRA

  • Additional Tools

  • LoRA, LoCon, LoHa: A Brief Glossary

  • Concepts, Styles, Poses, and Outfits

  • Multiple Concepts

  • Training Models Using Generated Images

  • Prodigy Optimizer

Baby Steps: Your First Character

We won't start off too ambitious here, so let's start out simple: baking your first character.

Here's a list of things that I use.

"Woah, that's a lot of bullet points don't you think?"

I'll get to some of those bullet points real soon, but here's what you're gonna get first: the Grabber, dupeGuru, and the dataset tag editor. You'll be using an easy-to-use tool to download images off of sites like Gelbooru and Rule34. Then you'll be using dupeGuru to remove any duplicate images that may negatively impact your training, and finally send the remainder of your images straight to the dataset tag editor.

Grabber

I use Gelbooru to download the images. You're familiar with booru tags, right? Hope your knowledge of completely nude and arms behind head carries you into this next section.

Got a character in mind? Great! Let's say I'll be training on a completely new character this website has never seen before, Kafka from Honkai: Star Rail!

If you want to use a different site other than Gelbooru, click on the Sources button located at the bottom of the window. It's best that you leave one site checked.

So what should you put that in search bar at the top? For me, I'll type solo -rating:explicit -rating:questionable kafka_(honkai:_star_rail). You don't have to add -rating:questionable, but for me, I want the characters to wear some damn clothes. You may also choose to remove solo if you don't mind setting aside extra time to crop things out. This then leaves -rating:explicit, should you remove it? Well, it depends entirely on you, but for me, I'll leave it. And just because, I'll throw in a shirt tag.

Well this looks promising: 259 results. Hit that Get all button. Switch over to your Downloads tab.

This tab is where you can keep track on what you're planning to download. Before we download, you see the bottom left? Choose a folder where you want your downloaded images to go. Then right click an item on the Downloads list and hit Download.

dupeGuru

All done? Great, let's switch over to dupeGuru.

Once it's opened, you're going to add a folder location for it scan through your images. Click the + symbol at the bottom left and click Add Folder. Choose the folder where your images reside in and then click Select Folder. If you want to determine the hardness at which dupeGuru detects duplicate images, then go to View > Options or hit Ctrl + P. Set your filter hardness there at the top of the Options window then hit OK. Adjust the slider if necessary. Once you're done with that, select the Application mode at the top to Picture. Hit Scan. When it's finished going through your images, I usually Mark all then delete (Ctrl+A and Ctrl+D if you wanna speedrun).

Note that this is not guaranteed to catch every duplicate image, so you'll still have to look through your dataset.

Curating

Inspect the rest of your images and see if there might have been any duplicate images dupeGuru might've missed and get rid of any bad quality images that might degrade the output of your training. Fortunately for me, Kafka is filled with plenty of good quality images, so I'll be selecting at most 100! Try going for other angles like from side and from behind, so that Stable Diffusion doesn't have to guess what they look like at those angles.

If you have images you really want to use, but found yourself in these cases:

  • Multiple Views of One Character

  • Unrelated Character(s)

  • Cropped Torso/Legs/etc.

Then you'll have to do a bit of image manipulation. Use any image editing application of your choice.

Integrating Additional Tools into your Workflow

Normally I'd tell you to open up the dataset tag editor first, but then this guide would end up as any other guide. Now we can't have that, can we? Not every image of a character can be found on Gelbooru. Let's go over some hypotheticals and see why I require the additional tools I have at my disposal to get the most out of my models.

Google's Webp

If you find yourself saving WEBPs often, then this might be in your interest.

If you're interested, download the appropriate libraries for your operating system. I use Windows. Next, unzip it somewhere you can remember.

For Windows users, hit that Win key and open up Edit the system environment variables, then click Environment variables at the bottom. Under User variables, click on PATH in the list, then click Edit. You're going to add a new variable and it will point to the \bin\ directory of the library you just downloaded, so for example, it will be C:\your\path\here\libwebp-1.3.0-windows-x64\bin. Hit OK for all of them afterward.

Make a special folder for your WEBPs. Put those little guys in there and open whatever terminal that has its path set to that folder location. Again, since I'm using Windows, I can just hit the bar at the top and then type cmd to open a command terminal for that folder.

Copy and paste the following to the terminal: for %f in (*.webp) do dwebp "%f" -o "%~nf.png"

What this will do is convert your WEBPs and output their copies as PNGs. Cool, take those PNGs and get rid of those disgusting WEBPs.

Python Notebooks

You may have some trouble if you don't know the basics of Python.

Hey, remember those PNGs you just got? Sometimes images have transparent backgrounds, which can affect the results of your generations if your data is full of them. It may not matter since other images in your data have filled in backgrounds, but this has become a routine of mine anyway. Let's get rid of those transparent backgrounds, so set yourself up another special folder and copy its folder path. Create a new Python notebook, then copy and paste the following code:

from PIL import Image
import os
import shutil

def add_white_background(input_dir, output_dir):
    for filename in os.listdir(input_dir):
        if filename.endswith('.png'):
            # Open the image
            image_path = os.path.join(input_dir, filename)
            img = Image.open(image_path)

            # Check if the image has an alpha channel (transparent background)
            if img.mode in ('RGBA', 'LA') or (img.mode == 'P' and 'transparency' in img.info):
                # Create a new image with a white background
                bg = Image.new('RGB', img.size, (255, 255, 255))
                bg.paste(img, mask=img.convert('RGBA').split()[3])

                # Save the image with a white background in the output directory
                output_filename = f"white_{filename}"
                output_path = os.path.join(output_dir, output_filename)
                bg.save(output_path)
                print(f"Saved: {output_path}")

input_directory = r"INPUT FOLDER"
output_directory = os.path.join(input_directory, "..", "OUTPUT FOLDER")
input_directory = input_directory.replace("\\", "/")

add_white_background(input_directory, output_directory)

Replace INPUT FOLDER with the folder containing images that have transparent backgrounds, then replace OUTPUT FOLDER to a subfolder where you want the output images to be placed in. Execute the cell and your images will have a filled white background.


This cell checks images that are smaller than the specified resolution, so you can upscale them later:

import os
from PIL import Image
import shutil

def check_image_dimensions(image_path):
    with Image.open(image_path) as img:
        width, height = img.size
        return width, height

def copy_file(source_path, destination_folder):
    filename = os.path.basename(source_path)
    destination_path = os.path.join(destination_folder, filename)
    shutil.copy(source_path, destination_path)

input_folder = r"INPUT FOLDER"
input_folder = input_folder.replace("\\", "/")
output_folder = os.path.join(input_folder, "..", "upscale")

# Create the output folder if it doesn't exist
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

for file_name in os.listdir(input_folder):
    file_path = os.path.join(input_folder, file_name)
    if os.path.isfile(file_path):
        width, height = check_image_dimensions(file_path)
        if width < 768 or height < 768:
            copy_file(file_path, output_folder)
            os.remove(file_path)

print("Image processing complete.")

In the above code, you'll be taking the folder where your training images reside in and have it be moved over to a new folder based on the dimensions of the images. In this case, if either the width or height of an image is less than 768, it gets moved.

Improving the Quality of Your Output Model

If you think your dataset is good enough and you're not planning on training at a resolution greater than 512, then skip this step.

I found that some of my LoRAs tend to perform decently well when the images in the dataset have been upscaled equal to or above my training resolution. I train my LoRAs at 768 since I thought it's a good midpoint between 512 and 1024 (and it's also a common width and/or height to generate at). Do note it takes longer to train when you up the training resolution. Use any upscaling method you want as long as it looks clear and sharp. For me, I use Batch Process in Stable Diffusion to upscale multiple images.

If you're screencapping an old and blurry anime, you should download this. Make sure to set Resize to 1 when using it.

Dataset Tag Editor

Alrighty, we're basically halfway there. Assuming your Stable Diffusion webui is open and the extension's/standalone's ready, let's start tagging your dataset.
Here's what mine currently looks like:

You should uncheck the box where it says Backup original text file (original file will be renamed like filename.000, .001, .002, ...).

CivitAI downsizes the images in this article, so to make things more clear:

  • I set Use Interrogator Caption to If Empty

  • I use the wd-v1-4-swinv2-tagger-v2 interrogator

  • I use a custom WDv1.4 threshold of 0.35, which is default

  • I sort my tags via Frequency and in Descending order

You can have these settings as default the next time you open up the webui if you click on Reload/Save Settings (config.json) then Save current settings.

Now copy and paste the folder path where your training images reside in over to Dataset directory. Click Load and then wait.

Once it's all loaded, your tags should look something like this:

These are the top tags the tagger has picked up on.

Wait, I have underscores in my tags! You can remove all the underscores by going to the Batch Edit Captions tag, then heading down to the section where it reads, "Search and Replace for all images displayed." Simply put _ inside the Search Text field, and then add a space in the Replace Text field. Toggle Each Tags then hit Search and Replace.


So what's next? Setting up a trigger word. Go to Batch Edit Captions and check Prepend additional tags (this can be a default setting too). This is how you add tags to the beginning of every single text file. If you want to add any necessary tags that you feel that are missing, leave it unchecked. In the Edit Tags box, give your model a trigger and then click Apply changes to filtered images. For me, I'll do kafka. Warning: If your tag contains something like penguin tag999, then your generations may include penguins in it possibly due to the token penguin, so use a unique tag whenever you need to.


Head back over to your tags. I usually prune things like hair length, hair color, and eye color so that those features are associated with the trigger word I just added. How do you prune? Choose one tag from the list and head back over to Batch Edit Captions then Remove.

Click each tag you want to prune, then click on Remove selected tags. Do note you're sacrificing some level of flexibility with your model, especially when you prune clothing tags. Again, if there are some missing tags, add them and make sure it's something Stable Diffusion can recognize when you prompt the tag you added. Remove any irrelevant tags, but be careful not to remove tags that occur too frequently in your dataset (e.g., white background). Keep any tags that you think Stable Diffusion could benefit learning from (e.g., halo).

All done? Hit that big 'ol Save Changes button at the top left and then click Unload.

Preparing the Oven

I use NovelAI (or equivalently, nai or animefull) as the base training model. I do not know whether training on unpruned or pruned makes a difference.

Again, I use Holostrawberry's LoRA Trainer. Here are my settings (I'll try to include some additional settings for you local folks):

  • folder_structure: Organize by project

  • resolution: 768 (lower it down to 512 if you want to train faster)

  • shuffle_tags: true

  • enable_bucket: true (the trainer will automatically resize images for you)

  • keep_tokens: 1

  • train_batch_size: 2

  • unet_lr: 5e-4 or 0.0005 (learning_rate's the same, you can lower it if you want)

  • text_encoder_lr: 1e-4 or 0.0001 (should be a fifth of your unet LR)

  • lr_scheduler: cosine_with_restarts

  • lr_scheduler_number / Num Restarts: 3

  • lr_warmup_ratio: 0.05 (if you're using constant_with_warmup)

  • min_snr_gamma: enabled / 5.0

  • lora_type: LoRA

  • network_dim: 32

  • network_alpha: 16

  • optimizer: AdamW8Bit

  • optimizer_args: "weight_decay=0.1, betas=[0.9,0.99]"

  • Clip Skip: 2

  • Max Token Length: 225

  • Training Precision: fp16 (bf16 if your hardware supports it)

  • XFormers: True (memory optimizations, leave it off if you don't need it)

  • Cache Latents: True

Now if you're a Colab user like me, you should have a folder called Loras in your Google Drive if you're going to use Organize by project. Make sure your folder structure looks like this: <your folder name> -> dataset where dataset is a subfolder that contains your images and text documents. Once you checked that your structure's correct, upload it to Google Drive inside the Loras folder.

Now while it's uploading, let's go over how many repeats and epochs you should use. First, how many images do you have? I did say I would choose up to 100 images for my dataset, so let's go over Holostrawberry's reference table.

20 images × 10 repeats × 10 epochs ÷ 2 batch size = 1000 steps

100 images × 3 repeats × 10 epochs ÷ 2 batch size = 1500 steps

400 images × 1 repeat × 10 epochs ÷ 2 batch size = 2000 steps

1000 images × 1 repeat × 10 epochs ÷ 3 batch size = 3300 steps

According to this table, I should set my repeats to 3 and epochs to 10, so that's what I'll be doing. After that, all I really need to do is set the project_name in Google Colab to whatever I named my project folder that's sitting in my Drive. In my case, it's hsr_kafka.

For those training locally, this is your folder naming scheme: repeats_projectname where you'll be replacing repeats with the number of repeats and projectname with whatever you want it to be.

  • If your GPU has more than 9 or 10 GB of VRAM, you can train at a resolution of 768 with batch size 2, XFormers enabled. Try not to do anything else that uses up more resources while the trainer is busy. Otherwise, reduce those settings.

Great, I think that settles it, let's run it and let the trainer handle the rest.


Is it Ready?

Is your LoRA finished baking? You can choose to either download a few of your latest epochs or all of them. Either way, you'll be testing to see if your LoRA works.

Head back to Stable Diffusion and start typing your prompt out. For example,

<lora:hsr_kafka-10:1.0>, solo, 1girl, kafka, sunglasses, eyewear on head, jacket, white shirt, pantyhose

Then enable the script, "X/Y/Z plot." Your X type will be Prompt S/R, which will basically search for the first thing in your prompt and replace it with whatever you tell it to replace. In X values, you'll type something like -10, -09, -08, -07. What this will do is find the first -10in your prompt and replace it with -09, -08, -07. Then hit Generate and find out which epoch works best for you.

Once you're done choosing your best epoch, you'll be testing which weight works, so for your X values, type something like 1.0>, 0.9>, 0.8>, 0.7>, 0.6>, 0.5>. Hit Generate again.

Your LoRA should ideally work at weight 1.0, but it's okay if it works best around 0.8 since this is your first time after all. Training a LoRA is an experimental game, so you'll be messing around with tagging and changing settings most of the time.

LoRA, LoCon, and LoHa

This brief glossary will try to help you in deciding whether or not it's best to go with training a LoRA model or a LyCORIS model. I won't be covering the nerdy bits, though.

  • LoRA: The default mini-model we all know and love. It's good enough to handle one character with a single outfit, one character with multiple outfits, multiple characters, and singular concepts. Normally I stick with a dim/alpha of 32/16, but you could also get away with 16/8 to save some more storage space. (Also, if you were wondering: network dim and alpha basically determines the size of your model. Lowering it too much can lose or even worsen some details, though there are some instances where you can get away with 1/1 to achieve a 1 MB LoRA.)

  • LoCon: A LyCORIS model. According to Holostrawberry, it is reportedly good for art styles. You can read more about this in EDG's tutorial. Your dim/alpha should be 16/8 and your conv_dim/conv_alpha should be 8/4.

  • LoHa: A LyCORIS model purportedly good for handling multiple concepts while also reducing bleed and saving storage space. Your dim/alpha should be 8/4 and your conv_dim/conv_alpha should be 4/1.

Both LyCORIS models will take longer to train than a regular LoRA.

Bleeding: Say your character's shirt has a fancy print on it. Now you want to prompt custom clothing like a dress, so you do that. Your character's dress will likely have that print generated on it whether you like that or not. This is one of many examples of bleeding and this example could be chalked up to improper tagging.

Concepts, Styles, Poses, and Outfits

Now that you know the basics of training a character LoRA, what if you want to train a concept, a style, a pose, and/or an outfit? Look for consistency and provide proper tagging.

For concepts: Add an activation tag and prune anything that relates closely to it. Here's an example. Notice that it only takes one tag to prompt a character holding the Grimace Shake. One element that remained consistent is the shake that appears in each image of the dataset. I've pruned tags such as holding and cup.

For styles: I prefer not adding an activation tag, so that all the user needs to do is call the model and prompt away. Just let the autotagger do its work then immediately save & exit. Here's an example. Again, make sure there's style consistency across all images. You'll want to raise up the epochs and test each one.

For poses: Add an activation tag and prune anything that relates closely to it. Here's an example. In the dataset, there was consistency of random characters putting their index fingers together.

For outfits: Add an activation tag and prune anything that relates closely to it. Here's an example. I've pruned tags such as cross and thighhighs.

Multiple Concepts

Sorting

This part will cover how to train a singular character who wears multiple outfits. You can apply the general idea of this method to multiple characters and concepts.

So you have an assortment of images. You're going to want to organize those images into separate folders that each represent a unique outfit.

Now let's say you're left with 4 folders with the following number of images:

  • Outfit #1: 23 images

  • Outfit #2: 49 images

  • Outfit #3: 100 images

  • Outfit #4: 79 images

Let's make things easier. Delete 3 images in the folder for outfit #1, 16 images in #2, and 29 images in #4. I'll elaborate on this later.

Tagging

Now you'll associate each outfit with their own activation tag. Use Zeta from Granblue Fantasy as a guide. These are my triggers for each outfit:

  • zetadef

  • zetasummer

  • zetadark

  • zetahalloween

Of course, I've pruned hair color, hair length, and eye color, but I've also left out hair style and clothing tags. You can choose to prune these and bake them into each activation tag.

Training Settings

Remember when I told you to delete a specific number of images in that hypothetical dataset of yours? What you'll be doing is trying to train each outfit equally, despite the differences in their image count. Here are the updated folders:

  • Outfit #1: 20 images

  • Outfit #2: 33 images

  • Outfit #3: 100 images

  • Outfit #4: 50 images

If I were Holostrawberry, he'd suggest using the following repeats for each folder:

  • Outfit #1: 5 repeats

  • Outfit #2: 3 repeats

  • Outfit #3: 1 repeat

  • Outfit #4: 2 repeats

If you're using his Colab notebook, head down to the section where it says, "Multiple folders in dataset." Here's what your cell should look like:

custom_dataset = """
[[datasets]]

[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit1"
num_repeats = 5

[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit2"
num_repeats = 3

[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit3"
num_repeats = 1

[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit4"
num_repeats = 2

"""

Let's do some math: (20 × 5) + (33 × 3) + (100 × 1) + (50 × 2) = 399

We'll label this number as T. Now let's determine how many epochs we should get. This is what I usually turn to:

200 T × 17 epochs ÷ 2 batch size = 1700 steps

300 T × 12 epochs ÷ 2 batch size = 1800 steps

400 T × 10 epochs ÷ 2 batch size = 2000 steps

So our T is closest to the last row, so we'll run with 10 epochs.


If you're using Derrian's LoRA Easy Training Scripts, you would see something like this:

This is how you'll control the repeats for each folder. You can prepend a number to each folder's name to automatically input the number of repeats.

With that out of the way, start the trainer!

Using Generated Images for Training

Can it be done? Yes, absolutely, for sure. We even created an anime mascot for this website!

If you're working to better your models, you should choose your best generations (i.e., the most accurate representation of your model). Inspect your images carefully, Stable Diffusion alone is already bad enough with hands. Don't make your next generations worse if you're not taking care of your dataset.

Prodigy

The Prodigy optimizer is a supposed successor to the DAdaptation optimizers and it's been out for quite some time. Two of my early uses of this was Wendy (Herrscher of Wind) and Princess Bullet Bill. It is aggressive in the way it learns and it's recommended for small datasets (I'll typically throw around 20 to 30 images at it, give or take). The optimizer is great but not terribly amazing by any means since it seems to mess up some details like tattoos. If you want to mess around with it, here are the settings you would modify:

  • optimizer: Prodigy

  • learning_rate: 0.5 or 1 (unet lr and tenc lr are the same)

  • network_dim/network_alpha: 32/32 or 16/16

  • lr_scheduler: constant_with_warmup

  • lr_warmup_ratio: 0.05

  • optimizer_args: "decouple=True, weight_decay=0.01, betas=[0.9,0.999], d_coef=2, use_bias_correction=True, safeguard_warmup=True"

For repeats, try shooting for 100 steps. For example, if I have 20 images, I would go with 5 repeats. For epochs, just set it to 20. While it's training, you'll see loss like here:

When it's nearly done training, look for the model with its loss at its lowest point. It'll typically be around 800 steps or so.

(Old) Final Thoughts

Hello, if you made it here, then thank you for taking the time to read the article. I did promise making this article to share you everything that I've done since that announcement. Though, I did rush some things up until the end, so this article is not completely final just yet. If there's any questions and criticisms you have, please let me know! If there's something that you think can be done more efficiently, please let me know! Treat this as a starting point to your way of training LoRAs. Not everything here is perfect and no method in training LoRAs is ever perfect.

And remember, making LoRAs is an experimentation game.

552

Comments