Intro
The following article details all my personal findings and recommendations for training a celebrity LoRA for SD1.5. I am also aiming to make this article a good starting point for someone just starting out without any prior knowledge.
Take this guide as a first step in getting a good workflow going and experiment to personalize it and make it better for your usage.
There are 2 workflows documented on my other article but for the sake of simplicity, I will only explain the second one here, if you are interested in learning about the OG workflow, please visit here.
Getting Started
Install the following software on your device before doing anything else. Installation guides and troubleshooting guides are available in the provided links.
[Index]
Dataset/Training Data - This is the collection of images that we want to the model to train on.
Tags - Given an image in the dataset, tags are the way we describe all the elements present in the image that we want SD to look for (and associate) so that we can guide it during training( more on this later).
Software
Install kohya_ss from here.
This software runs the training process for you and can also help with tagging the specific images. I use kohya_ss for my training but other solutions like automatic1111 training, comfyui training, sd-scripts and others exist. If you're just starting out, stick with kohya.
Install BooruDatasetTagManager from here.
This software helps you manage tags for the given dataset.
Step 1 - Dataset Preparation
This is the most important step of this process and will decide how well your model turns out. First start collecting images of the celebrity we want to train on.
We will collect 2 kinds of images
Face closeups
Body images
This is important because we want the model to be able to accurately generate the person in all kinds of images - closeups, portraits, upper body shots, full body shots etc.
Common Tips
Do Not include images that are low res, blurry or with bad lighting (includes colored lighting) and shadowed faces
Do Not pick images that include anybody else in it - no faces/bodies nothing
The ratio of face images : body images should be 1.5x-2x
Avoid adding multiple images from the same photoshoot (the model will overfit to recreate the same lighting)
Face Images
Search google images/bing images/reddit for celebrity-name face
Download every single image where the face is clearly visible and the resolution is greater than 1k on each side
Pick and select at least 30 images - upto 50
Make sure you get multiple images of every hairstyle you want to train it for e.g. blonde curly, back and green, white, pink etc.
If the person wears glasses, get variations of those as well - Once the images are ready, go to presize.io and upload all the images
Crop all face images to a size of 512x512 - without losing chunks (no cut off face, chin, hair etc) - Crop until the person's neck/shoulder (nothing more)
Download this zip file
Extract and now you will have 30-50 images out of which you need to pick 25-30
In the final resultant set, include as many angles as possible (left facing, from top, from below, right facing, front) and as many hairstyles
As much as possible avoid duplicates of the same hairstyle (unless the person consistently has the same one/ you want the character to lean towards that hairstyle)
After deleting duplicates and culling the comparatively bad ones, you should be left with 25-30 really high quality images that showcase the person's face very well.
Body Images
All the above rules apply - collect 20-40 images
Crop until the waist or just below, no need to include feet (unless you want that to be replicated 😉)
Crop to a size of 512x768
Include different clothes shots and ensure at least 2-3 standing shots where 90% of the body is visible
If you want the model to replicate a specific body part accurately (include multiple images of the same in different angles)
No obstructions (nothing in front of the body like a desk, handbag etc.)
Resolution of the original image at least 1k on both sides
Include multiple zoom levels - but more of upper body (head to abdomen)
After the selection process, pick 15-20 good images
[Index]
Trigger word - The way SD works is that it looks at an image and it's tags and associates one with the other.
For example - given enough images of a tree with the tag tree
it learns to associate the general shape of a tree whenever the word tree
is used in the prompt.
So if we are training a LoRA of Emma Watson and we use the words emma watson
in the prompt, SD will try and generate it's internal association of that word. We don't want that - we instead want to choose a word that SD probably hasn't encountered before like 3mm4w4ts0n
.
Since this word probably doesn't mean anything to SD, we can isolate our model's training when generating images.
Other guides say this doesn't matter(even a reddit post by the creators of SD) but I use it anyway.
Repeats - This is the number of times we want a single image to be repeated during training. Different guides have different recommendations on picking this number - feel free to experiment.
Folder Structure
Now you need to pick a trigger for the model and create the folders necessary for kohya to train on them.
I suggest using something similar to the following examples
Emma Watson -
3mm4w4ts0n
Scarlett Johansson -
sc4rj0
Joe Pesci -
j03p3sc1
It just has to be a combination of numbers and letters that are different enough from commonly used words in the English language.
Once all your images are ready, create the following folders and place the images in them.
For number of repeats, I suggest going with 24 for face images and 16 for body images. (More details on this in the following section).
You can create this folder anywhere, just remember where you placed it. I have a training
folder setup next to the SD installation folder for consistency. Inside this training folder you can keep adding new folders for every model you want to train.
- <trigger-word>
- images
- <repeats>_<trigger-word> (for face images)
- <repeats>_<trigger-word> (for body images)
- model (for kohya to place the trained models into)
This folder structure essentially tells kohya to train the model for the trigger-word
and to train the images in each folder for that many repeats.
Tagging
Now we need to start tagging our images in order to drill down on exactly what we are training in this model.
Start
kohya_ss
by navigating to it's folder and clicking ongui.bat
Navigate to this URL in your browser
http://127.0.0.1:7860/
Go to
Utilities -> Captioning -> WD14 Captioning
In the
Image Folder to caption
select the face images folder i.e24_<trigger-word>
Add the
trigger-word,
in thePrefix to add to WD14 caption
field with a commaThe comma is used to separate one tag from the next
There are multiple models that we can use for tagging in the `Model` dropdown - we'll stick with the default for now.
Once you click on
Caption Images
it will download/load the model and tag each image in that folder.Repeat this process for body images as well
Once this process is complete, you can see that in each folder a .txt
file has been created for each image in the folder with the same name. These text files now contain all the tags corresponding to each image.
Start
BooruDatasetTagManager
by navigating to it's folder and clicking onBooruDatasetTagManager.exe
Once open - click on
File -> Load Folder
and pick the face images folder.Once loaded - it will have 3 tabs
Dataset - Image Tags - All Tags
For every image in the left tab - once selected, it will show you the tags that the model generated for you in the middle pane.
This is where we can do some cleanup
Go through each image and it's corresponding tags and try to find something wrong
For example, if it mistagged a female celebrity as
1boy
ormale focus
- delete the tags using the cross button on the right and save your changes.If there are any tags that don't exist in the image - get rid of those as well
These next few points are useful if you're training the person in a specific costume/hairstyle
In the following case, delete all mentions of the costume/style you want to train - For example, if the person has blue eyes and you want to make sure the all generations of this model should have blue eyes - remove that tag
Essentially we are baking in the blue eyes into the model by not adding it to the tag list. The same goes for costumes/hairstyles etc.
If the character has a specific costume that you want to train on while retaining the original appearance as well
Include up to 25% images of that character in both face and body images
For example, Margot Robbie/Harley Quinn - 20 face images of Margot and 10 face images as Harley | 15 body images of Margot and 5 body images as Harley
For the Harley images, add an additional tag to differentiate like
h4rl3yq
This will allow the model to generate Harley images when it sees the
h4rl3yq
tag andtrigger-word
Once you have repeated this process for body images as well, we are ready for training.
Step 2 - Training
This section can get pretty complicated pretty quickly, I will try and give explanations for each thing I think you should know but will be skipping over a lot of things. For detailed explanations on everything, please refer to some videos on YouTube where they will do a much better job than I can.
In this step, we will now train our model on the dataset we just prepared.
Go back to
kohya
in your browser and navigate to theLoRA
tab.FYI, we are training our model in such a way that it directly outputs the newly trained data
we can also use Dreambooth training to add onto our existing model and later extract it out. Many guides suggest that this approach is better but I haven't tried it - so idk. If you want to try the other method - YouTube.
I have included the settings you will use in
kohya
as an attachment that you can directly load into it by going toConfiguration File -> Open
but I suggest you do this manually the first time around to understand the options better.Under
Source Model -> Model Quick Pick
selectrunwayml/stable-diffusion-v1-5
from the dropdownThis is where we are selecting the base model on top of which we want to train our dataset on.
Additional context for picking a model (ignore for new users)
I always train on the base SD model because it performs very well when used with different fine-tuned models. This provides flexibility during inference.
BUT you can also pick your favorite model in this step to get better/different results. This isn't a strict rule and experimentation is encouraged.
When training with a different model, you might have to change your repeats/epochs/model strength a bit but it should work okayish if you stick with these settings as well.
Model specific training is too varied to cover here.
Navigate to the
Folders
tab and select the following -Image folder
select the
images
folder within the parenttrigger-word
folder i.e3mm4w4ts0n -> images
. This will pick up all subfolders - both our face and body image folders.Output Folder
- select themodel
folder within the parenttrigger-word
folder i.e3mm4w4ts0n -> model
. This is where kohya will save the trained models.Model ouput name
- Addtrigger-word
here. This is just the name of the file but it's good practice to keep it the same name astrigger-word
for organisation.Navigate to the
Parameters
tabSince we are training a regular
LoRA
,we will leaveLoRA type
toStandard
There are other types we can train like LyCORIS and others - if interested - YouTube
Train batch size
- This specifies how many images it trains at one time - we can leave it at 1 for nowthis can be increased according to your computer's specifications
I don't recommend going higher than 4.
Epoch
- One epoch is basically one round of training of all our images. For example, if we have a total of 10 images and 30 repeats - 300 steps of training = 1 epoch. We want to train our model enough for it to grasp the dataset's features.Pick 10 for now
Picking epochs and repeats is a varied topic with many preferences
Detailed explanations for picking epochs can be found on YouTube
Save every N epochs
we will leave this at 1. This tells kohya to save the training state after every epoch. This will be useful in picking the best version of the model (more on this later on)Caption Extension
- set this to.txt
since this is the extension we used when generating tags.Mixed precision/Save precision
- set this tobf16
if you have a RTX 30 series card, if not - leave it atfp16
Optimizer
- set this toAdamW8bit
if you have a RTX 30 series card, if not pick -AdamW
(Ignore for new users)
Prodigy optimizer automatically finds the learning rate and is supposedly better
but I haven't had any luck with it (the models turn out too rigid).
Network Rank (Dimension) / Network Alpha
- set these to32
each.Setting this higher/lower will affect how much data the model stores*
higher isn't always better
some guides recommend setting the
Alpha
to half or lowerno luck with these either
Max Resolution
- set this to512,512
Enable Buckets
- uncheck thisThis option is useful if we haven't cropped our dataset to have the same sizes
If you want to skip the cropping (don't), then enable this
Click on
Start Training
General Tips
Try to keep total steps to ~1500 or lower.
Play around with the ratio of repeats of face vs body to get better results.
Experimentation is key here - altering the number of images and repeats will help you optimize and find the best model for your needs.
Like I mentioned before, I have skipped over a lot of details/variations that can be made to the parameters above. You can play around with these values to get drastically different results. Please refer to YouTube to see how other people train their models. Detailed Explanations and reasonings for each of these parameters can also be found there.
Step 3 - Picking a model
Once training is complete, kohya will have generated some files in our model
folder.
Like so trigger-word-000001.safetensors
, trigger-word-000002.safetensors
, ..., trigger-word-000010.safetensors
.
The numbers at the end of the file name is the epoch at which they were saved.
Higher isn't always better - we need to pick the model that works best for us.
Here is where we are introduced to 2 effects that take place during training
Under fitting - If the model hasn't trained long enough on the dataset, using this LoRA will not reproduce the likeness well enough. You can observe this by using that specific epoch and generating images. If the likeness just isn't there - it's a sign of underfitting. We have to move to a higher epoch.
Over fitting - If the model has trained for too long, using the LoRA will result in the images being burnt i.e it tries to reproduce the training set too much. The output will look messy and the hair/clothing from the original training set will seep into your generations. Changing hair color in the prompt has no effect, things like that. This is a sign of overfitting, we have to move to a lower epoch.
With this workflow, I have found that a strength of 0.7-0.8
is the ideal strength at which it works really well. Now that we know what to look for, we will now select the best epoch/version of the file that we want to use.
The basic idea is to generate the same image using each version and compare the output.
Look for signs of under/over fitting in each generation.
Pick the version that looks the best and select that.
When generating images for comparison, you might want to use the model you wish to use for inference i.e your favorite model that you use for all your images.
If you intend this model to be widely used, you can also test on multiple models and check it's performance.
I use
serenityv2
my @malcomrey since it gives me the best looking likeness for this workflow.ADetailer
,FaceDetailer
is a must for generating images that aren't portraits. They help bring back detail in the face after generation.
Now that you've found the best version of this model, you can either finalize that if you're happy with the results or go back and train with different parameters/dataset to get better results.
If the model doesn't turn out as well as you'd hoped - my first step would be to get a better dataset.
The dataset will have a stronger impact than changing a few parameters.
BUT as always, experimentation is key and finding the workflow that best match you preferences is half the fun.
Step 4 - Profit?
These are just some closing thoughts
There are many other creators on this platform with models much better than the ones you'll get with this workflow, but we all have to start somewhere. Use this as a starting point and try to make it better. Any suggestions/improvements you can make, please share it with the rest of us.
This article is a complete summary of everything I know about training LoRAs for SD1.5. It's certainly not the "best" but personally, I am unsure of what I can change to make it better.
Thank you guys for all the support you've shown me.
Please feel free to comment with any suggestions/feedback on the article.