Update (8/17/2023):
I didn't add the part where you can remove the underscores after using the autotagger, so I did.
I use the NAI checkpoint to train my models. This part has been added.
How often do I change my settings? I answer this question down in the Preparing the Oven section.
Preface
If you want to dedicate only 30-40 minutes of your time to making a simple character LoRA, I highly recommend checking out Holostrawberry's guide. It is very straightforward, but if you're interested in looking into almost everything that I do and you're willing to set aside more time, continue reading. This guide piggybacks a bit off of his guide and Colab Notebooks, but you should still be able to get the gist about everything else I am about to cover.
The cover image uses FallenIncursio's Pepe Silvia Meme if you're wondering.
Introduction
Here's what this article currently covers:
Pre-requisites
Basic Character LoRA
Additional Tools
LoRA, LoCon, LoHa: A Brief Glossary
Concepts, Styles, Poses, and Outfits
Multiple Concepts
Training Models Using Generated Images
Baby Steps: Your First Character
We won't start off too ambitious here, so let's start out simple: baking your first character.
Here's a list of things that I use.
A Stable Diffusion webui of your choice (Automatic1111, Vladmandic, etc...)
Holostrawberry's LoRA Trainer (for those wanting to train locally: kohya_ss)
"Woah, woah! Seven bullet points? Seems a little bit much, don't you think?"
I'll get to some of those bullet points real soon, but here's what you're gonna get first: the Grabber, dupeGuru, and the dataset tag editor. You'll be using an easy-to-use tool to download images off of sites like Gelbooru and Rule34 if you really wanted. Then you'll be using dupeGuru to remove any duplicate images that may negatively impact your training, and finally send the remainder of your images straight to the dataset tag editor.
Grabber
I use Gelbooru to download the images. You're familiar with booru tags, right? Hope your knowledge of completely nude
and arms behind head
carries you into this next section.
Got a character in mind? Great! Let's say I'll be training on a completely new character this website has never seen before, Kafka from Honkai: Star Rail!
If you want to use a different site other than Gelbooru, click on the Sources button located at the bottom of the window. It's best that you leave one site checked.
So what should you put that in search bar at the top? For me, I'll type solo -rating:explicit -rating:questionable kafka_(honkai:_star_rail)
. You don't have to add -rating:questionable
, but for me, I want the characters to wear some damn clothes. You may also choose to remove solo
if you don't mind setting aside extra time to crop things out. This then leaves -rating:explicit
, should you remove it? Well, it depends entirely on you, but for me, I'll leave it. And just because, I'll throw in a shirt
tag.
Well this looks promising: 259 results. Hit that Get all button. Switch over to your Downloads tab.
This tab is where you can keep track on what you're planning to download. Before we download, you see the bottom left? Choose a folder where you want your downloaded images to go. Then right click an item on the Downloads list and hit Download.
dupeGuru
All done? Great, let's switch over to dupeGuru.
Once it's opened, you're going to add a folder location for it scan through your images. Click the + symbol at the bottom left and click Add Folder. Choose the folder where your images reside in and then click Select Folder. If you want to determine the hardness at which dupeGuru detects duplicate images, then go to View > Options or hit Ctrl + P. Set your filter hardness there at the top of the Options window then hit OK. Mine is set at 97. Once you're done with that, select the Application mode at the top to Picture. Hit Scan. When it's finished going through your images, I usually Mark all then delete (Ctrl+A and Ctrl+D if you wanna speedrun).
Note that this is not guaranteed to catch every duplicate image, so you'll still have to look through your dataset.
Curating
Inspect the rest of your images and see if there might have been any duplicate images dupeGuru might've missed and get rid of any bad quality images that might degrade the output of your training. Fortunately for me, Kafka is filled with plenty of good quality images, so I'll be selecting at most 100! Try going for other angles like from side and from behind, so that Stable Diffusion doesn't have to guess what they look like at those angles.
If you have images you really want to use, but found yourself in these cases:
Multiple Views of One Character
Unrelated Character(s) that Are Barely Off Frame
Visible Body Cutoff
Then you'll have to do a bit of image manipulation. Use any image editing application of your choice.
Here's how I'll deal with the first bullet point: copy the image file into multiple images depending on how many "views" there are of the character. I'll crop the image down to each angle. Do not do this if the images come out to be too small for training.
For the second, all you need to do is crop the image so that those unrelated characters are no longer there. A built-in photo editor can help achieve this in seconds.
Sometimes artists like to visibly "cut off" certain parts of the character's body (e.g., torso, legs). All you need to do is just crop the image where cutoff is no longer visible. Easy.
If there's a really wide shot of a character, you can crop it way down to them.
Integrating Additional Tools into your Workflow
Normally I'd tell you to open up the dataset tag editor first, but then this guide would end up as any other guide. Now we can't have that, can we? Not every image of a character can be found on Gelbooru. Let's go over some hypotheticals and see why I require the additional tools I have at my disposal to get the most out of my models.
Google's Webp
If you find yourself saving WEBPs often, then this might be in your interest.
If you're interested, download the appropriate libraries for your operating system. I use Windows. Next, unzip it somewhere you can remember.
For Windows users, hit that Win key and open up Edit the system environment variables, then click Environment variables at the bottom. Under User variables, click on PATH in the list, then click Edit. You're going to add a new variable and it will point to the \bin\ directory of the library you just downloaded, so for example, it will be C:\your\path\here\libwebp-1.3.0-windows-x64\bin. Hit OK for all of them afterward.
Make a special folder for your WEBPs. Put those little guys in there and open whatever terminal that has its path set to that folder location. Again, since I'm using Windows, I can just hit the bar at the top and then type cmd to open a command terminal for that folder.
Copy and paste the following to the terminal: for %f in (*.webp) do dwebp "%f" -o "%~nf.png"
What this will do is convert your WEBPs and output their copies as PNGs. Cool, take those PNGs and get rid of those disgusting WEBPs.
Python Notebooks
You may have some trouble if you don't know the basics of Python.
Hey, remember those PNGs you just got? Sometimes images have transparent backgrounds, which can affect the results of your generations if your data is full of them. It may not matter since other images in your data have filled in backgrounds, but this has become a routine of mine anyway. Let's get rid of those transparent backgrounds, so set yourself up another special folder and copy its folder path. Create a new Python notebook, then copy and paste the following code:
from PIL import Image
import os
import shutil
def add_white_background(input_dir, output_dir):
for filename in os.listdir(input_dir):
if filename.endswith('.png'):
# Open the image
image_path = os.path.join(input_dir, filename)
img = Image.open(image_path)
# Check if the image has an alpha channel (transparent background)
if img.mode in ('RGBA', 'LA') or (img.mode == 'P' and 'transparency' in img.info):
# Create a new image with a white background
bg = Image.new('RGB', img.size, (255, 255, 255))
bg.paste(img, mask=img.convert('RGBA').split()[3])
# Save the image with a white background in the output directory
output_filename = f"white_{filename}"
output_path = os.path.join(output_dir, output_filename)
bg.save(output_path)
print(f"Saved: {output_path}")
# Specify the input and output directories
input_directory = r"your\input\folder"
output_directory = r"your\output\folder"
input_directory = input_directory.replace("\\", "/")
output_directory = output_directory.replace("\\", "/")
# Call the function
add_white_background(input_directory, output_directory)
All you really need to do after is paste in your folder directory that contains only images that have transparent backgrounds, then execute the cell and watch the magic happen (it just fills the background with white). Put your results back into your training data. If for whatever reason you run into an issue, you can simply convert it to JPG:
import os
from PIL import Image
def convert_png_to_jpg(folder_path):
for filename in os.listdir(folder_path):
if filename.endswith(".png"):
png_path = os.path.join(folder_path, filename)
jpg_path = os.path.splitext(png_path)[0] + ".jpg"
# Open the PNG image and convert it to RGB mode
png_image = Image.open(png_path).convert("RGB")
# Save the image as JPG
png_image.save(jpg_path, "JPEG")
# Remove the original PNG file
os.remove(png_path)
folder_path = r"your\path\here"
folder_path = folder_path.replace("\\", "/")
convert_png_to_jpg(folder_path)
My last cell concerns with images that could use some upscaling and I'd rather not spend any more time looking over each image that may be too small, so here's what I have:
import os
from PIL import Image
import shutil
# Function to check the dimensions of an image
def check_image_dimensions(image_path):
with Image.open(image_path) as img:
width, height = img.size
return width, height
# Function to copy files to another folder
def copy_file(source_path, destination_folder):
filename = os.path.basename(source_path)
destination_path = os.path.join(destination_folder, filename)
shutil.copy(source_path, destination_path)
print(f"File '{filename}' copied to '{destination_folder}'")
# Folder paths
input_folder = r"your\training\folder"
input_folder = input_folder.replace("\\", "/")
output_folder = r"your\path\here"
output_folder = output_folder.replace("\\", "/")
# Create the output folder if it doesn't exist
if not os.path.exists(output_folder):
os.makedirs(output_folder)
# Iterate over the files in the input folder
for file_name in os.listdir(input_folder):
file_path = os.path.join(input_folder, file_name)
if os.path.isfile(file_path):
width, height = check_image_dimensions(file_path)
if width < 768 or height < 768:
copy_file(file_path, output_folder)
os.remove(file_path)
print(f"Original file '{file_name}' deleted")
print("Image processing complete.")
In the above code, you'll be taking the folder where your training images reside in and have it be moved over to a new folder based on the dimensions of the images. In this case, if either the width or height of an image is less than 768, it gets moved.
Improving the Quality of Your Output Model
If you think your dataset is good enough and you're not planning on training at a resolution greater than 512, then feel free to skip this step.
I found that some of my LoRAs tend to perform decently well when the images in the dataset have been upscaled equal to or above my training resolution. I train my LoRAs at 768 since I thought it's a good midpoint between 512 and 1024 (and it's also a common width and/or height to generate at). Do note it takes longer to train when you up the training resolution. Use any upscaling method you want as long as it looks clear and sharp. For me, I use Stable Diffusion through Batch Process to upscale multiple images.
If you're screencapping an old and blurry anime, you may need to download this. Make sure to set Resize to 1 when using it.
Dataset Tag Editor
Alrighty, we're basically halfway there. Assuming your Stable Diffusion webui is open and the extension's/standalone's ready, let's start tagging your dataset.
Here's what mine currently looks like:
You should uncheck the box where it says Backup original text file (original file will be renamed like filename.000, .001, .002, ...).
CivitAI downsizes the images in this article, so to make things more clear:
I set Use Interrogator Caption to If Empty
I use the wd-v1-4-swinv2-tagger-v2 interrogator
I use a custom WDv1.4 threshold of 0.5, but you may have to set it lower according to the tags you get
I sort my tags via Frequency and in Descending order
You can have these settings as default the next time you open up the webui if you click on Reload/Save Settings (config.json) then Save current settings.
Now copy and paste the folder path where your training images reside in over to Dataset directory. Click Load and then wait.
Once it's all loaded, your tags should look something like this:
These are the top tags the tagger has picked up on.
Wait, I have underscores in my tags! You can remove all the underscores by going to the Batch Edit Captions tag, then heading down to the section where it reads, "Search and Replace for all images displayed." Simply put _ inside the Search Text field, and then a space in the Replace Text field. Toggle Each Tags then hit Search and Replace.
So what's next? Setting up a trigger word. Go to Batch Edit Captions and check Prepend additional tags (this can be a default setting too). This is how you add tags to the beginning of every single text file. If you want to add any necessary tags that you feel that are missing, leave it unchecked. In the Edit Tags box, give your model a trigger and then click Apply changes to filtered images. For me, I'll do kafka
. Warning: If your tag contains something like penguin tag999
, then your generations may include penguins in it possibly due to the token penguin
, so use a unique tag whenever you need to.
Head back over to your tags. I usually prune things like hair length, hair color, and eye color so that those features are associated with the trigger word I just added. How do you prune? Choose one tag from the list and head back over to Batch Edit Captions then Remove.
Click each tag you want to prune, then click on Remove selected tags. Do note you're sacrificing some level of flexibility with your model, especially when you prune clothing tags. Again, if there are some missing tags, add them and make sure it's something Stable Diffusion can recognize when you prompt the tag you added.
All done? Hit that big 'ol Save Changes button at the top left and then click Unload.
Preparing the Oven
I use NovelAI (or equivalently, nai or animefull) as the base training model.
Again, I use Holostrawberry's LoRA Trainer. Here are my settings (I'll try to include some additional settings for you local folks):
folder_structure: Organize by project
resolution: 768 (lower it down to 512 if you want to train faster but sacrifice catching details)
shuffle_tags: true
enable_bucket: true (the trainer will automatically resize images for you)
keep_tokens: 1
train_batch_size: 2
unet_lr: 5e-4 or 0.0005 (learning_rate's the same)
text_encoder_lr: 1e-4 or 0.0001
lr_scheduler: cosine_with_restarts
lr_scheduler_number: 3
lr_warmup_ratio: 0.05 (if you're using constant_with_warmup)
min_snr_gamma: enabled (5.0 for local)
lora_type: LoRA
network_dim: 32
network_alpha: 16
optimizer: AdamW8Bit
optimizer_args: "weight_decay=0.1"
Now if you're a Colab user like me, you should have a folder called Loras in your Google Drive if you're going to use Organize by project. Make sure your folder structure looks like this: <your folder name> -> dataset
where dataset
is a subfolder that contains your images and text documents. Once you checked that your structure's correct, upload it to Google Drive inside the Loras folder.
Now while it's uploading, let's go over how many repeats and epochs you should use. First, how many images do you have? I did say I would choose up to 100 images for my dataset, so let's go over Holostrawberry's reference table.
20 images × 10 repeats × 10 epochs ÷ 2 batch size = 1000 steps
100 images × 3 repeats × 10 epochs ÷ 2 batch size = 1500 steps
400 images × 1 repeat × 10 epochs ÷ 2 batch size = 2000 steps
1000 images × 1 repeat × 10 epochs ÷ 3 batch size = 3300 steps
According to this table, I should set my repeats to 3 and epochs to 10, so that's what I'll be doing. After that, all I really need to do is set the project_name in Google Colab to whatever I named my project folder that's sitting in my Drive. In my case, it's hsr_kafka.
For those training locally, this is your folder naming scheme: repeats_projectname
where you'll be replacing repeats
with the number of repeats and projectname
with whatever you want it to be.
Great, I think that settles it, let's run it and let the trainer handle the rest.
What settings do you use for X concept? Other than repeats, epochs, and LoRA type, I don't change settings very often. Mess around with the settings like unet and tenc if you want, it's just that the numbers Holo has given has been working out for me for the most part. Feel free to train using the Dadaptation optimizer as well (this adjusts the learning rate automatically), though I don't find myself using it very often.
Is it Ready?
Is your LoRA finished baking? You can choose to either download a few of your latest epochs or all of them. Either way, you'll be testing to see if your LoRA works.
Head back to Stable Diffusion and start typing your prompt out. For example,
<lora:hsr_kafka-10:1.0>, solo, 1girl, kafka, sunglasses, eyewear on head, jacket, white shirt, pantyhose
Then enable the script, "X/Y/Z plot." Your X type will be Prompt S/R, which will basically search for the first thing in your prompt and replace it with whatever you tell it to replace. In X values, you'll type something like -10, -09, -08, -07
. What this will do is find the first -10
in your prompt and replace it with -09, -08, -07
. Then hit Generate and find out which epoch works best for you.
Once you're done choosing your best epoch, you'll be testing which weight works, so for your X values, type something like 1.0>, 0.9>, 0.8>, 0.7>, 0.6>, 0.5>
. Hit Generate again.
Your LoRA should ideally work at weight 1.0, but it's okay if it works best around 0.8 since this is your first time after all. Training a LoRA is an experimental game, so you'll be messing around with tagging and changing settings most of the time.
LoRA, LoCon, and LoHa
This brief glossary will try to help you in deciding whether or not it's best to go with training a LoRA model or a LyCORIS model. I won't be covering the nerdy bits, though.
LoRA: The default mini-model we all know and love. It's good enough to handle one character with a single outfit, one character with multiple outfits, multiple characters, and singular concepts. Normally I stick with a dim/alpha of 32/16, but you could also get away with 16/8 to save some more storage space. (Also, if you were wondering: network dim and alpha basically determines the size of your model. Lowering it too much can lose or even worsen some details, though there are some instances where you can get away with 1/1 to achieve a 1 MB LoRA.)
LoCon: A LyCORIS model I understand the least. According to Holostrawberry, it is reportedly good for art styles. You can read more about this in EDG's tutorial. Your dim/alpha should be 16/8 and your conv_dim/conv_alpha should be 8/1.
LoHa: A LyCORIS model reportedly good for handling multiple concepts while also reducing bleed and saving storage space. Your dim/alpha should be 8/4 and your conv_dim/conv_alpha should be 4/1.
Both LyCORIS models will take longer to train than a regular LoRA.
Bleeding: Say your character's shirt has a fancy print on it. Now you want to prompt custom clothing like a dress, so you do that. Your character's dress will likely have that print generated on it whether you like that or not.
Concepts, Styles, Poses, and Outfits
Now that you know the basics of training a character LoRA, what if you want to train a concept, a style, a pose, and/or an outfit? It's usually a lot more simpler than you think. You just need consistency and proper tagging.
For concepts: Add an activation tag and prune anything that relates closely to it. Here's an example. Notice that it only takes one tag to prompt a character holding the Grimace Shake. One element that remained consistent is the shake that appears in each image of the dataset. I've pruned tags such as holding
and cup
.
For styles: I prefer not adding an activation tag, so that all the user needs to do is call the model and prompt away. Just let the autotagger do its work then immediately save & exit. Here's an example. Again, make sure there's style consistency across all images. You'll want to raise up the epochs and test each one.
For poses: Add an activation tag and prune anything that relates closely to it. Here's an example. In the dataset, there was consistency of random characters putting their index fingers together.
For outfits: Add an activation tag and prune anything that relates closely to it. Here's an example. I've pruned tags such as cross
and thighhighs
.
Multiple Concepts
Sorting
This part will cover how to train a singular character who wears multiple outfits. You can apply the general idea of this method to multiple characters and concepts.
So you have an assortment of images. You're going to want to organize those images into separate folders that each represent a unique outfit.
Now let's say you're left with 4 folders with the following number of images:
Outfit #1: 23 images
Outfit #2: 49 images
Outfit #3: 100 images
Outfit #4: 79 images
Let's make things easier. Delete 3 images in the folder for outfit #1, 16 images in #2, and 29 images in #4. I'll elaborate on this later.
Tagging
Now you'll associate each outfit with their own activation tag. Use Zeta from Granblue Fantasy as a guide. These are my triggers for each outfit:
zetadef
zetasummer
zetadark
zetahalloween
Of course, I've pruned hair color, hair length, and eye color, but I've also left out hair style and clothing tags. You can choose to prune these and bake them into each activation tag.
Training Settings
Remember when I told you to delete a specific number of images in that hypothetical dataset of yours? What you'll be doing is trying to train each outfit equally, despite the differences in their image count. Here are the updated folders:
Outfit #1: 20 images
Outfit #2: 33 images
Outfit #3: 100 images
Outfit #4: 50 images
If I were Holostrawberry, he'd suggest using the following repeats for each folder:
Outfit #1: 5 repeats
Outfit #2: 3 repeats
Outfit #3: 1 repeat
Outfit #4: 2 repeats
If you're using his Colab notebook, head down to the section where it says, "Multiple folders in dataset." Here's what your cell should look like:
custom_dataset = """
[[datasets]]
[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit1"
num_repeats = 5
[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit2"
num_repeats = 3
[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit3"
num_repeats = 1
[[datasets.subsets]]
image_dir = "/content/drive/MyDrive/Loras/PROJECTNAME/outfit4"
num_repeats = 2
"""
Let's do some math: (20 × 5) + (33 × 3) + (100 × 1) + (50 × 2) = 399
We'll label this number as T. Now let's determine how many epochs we should get. This is what I usually turn to:
200 T × 17 epochs ÷ 2 batch size = 1700 steps
300 T × 12 epochs ÷ 2 batch size = 1800 steps
400 T × 10 epochs ÷ 2 batch size = 2000 steps
So our T is closest to the last row, so we'll run with 10 epochs. Feel free to switch to training it as a LoHa.
With that out of the way, start the trainer!
Using Generated Images for Training
Can it be done? Yes, absolutely, for sure.
If you're working to better your models, you should choose your best generations (i.e., the most accurate representation of your model). Inspect your images carefully, Stable Diffusion alone is already bad enough with hands. Don't make your next generations worse if you're not taking care of your dataset.
Final Thoughts
Hello, if you made it here, then thank you for taking the time to read the article. I did promise making this article to share you everything that I've done since that announcement. Though, I did rush some things up until the end, so this article is not completely final just yet. If there's any questions and criticisms you have, please let me know! If there's something that you think can be done more efficiently, please let me know! Treat this as a starting point to your way of training LoRAs. Not everything here is perfect and no method in training LoRAs is ever perfect.
And remember, making LoRAs is an experimentation game.