Celebrity LoRA Training Guide

I use the following steps to create a celebrity LoRA. Posting them here to share and possibly receive some suggestions for improvements.

0.Updates

UPDATE 15: Released my first Flux LoRA here Alexandra Daddario - [Flux, Pony, SD1.5] - Flux | Stable Diffusion LoRA | Civitai
The output turned out way better than I expected. Had to train on a runpod for 3 hours, cost under a dollar.
All credit goes to this tool - GitHub - ostris/ai-toolkit: Various AI scripts. Mostly Stable Diffusion stuff.

UPDATE 14: Tried an entirely new workflow for the latest version of my Emma Stone model. The results seem really good. All the models of the latest workflow tend to overcook on strength 1 and also aren't very malleable. They will deviate from the training but not by much. With this new workflow, not only is the likeness good (even without facedetailer) but it seems to adhere to prompts better. It also works in different checkpoints (other than serenity) just as well. Please try the model here https://civitai.com/models/230796?modelVersionId=704297 and leave a comment on whether this model is better than the current latest workflow.

UPDATE 13: The PONY model training is going quite well, have attached the training json to the article. I haven't changed much from the original recommended training preset by the creator of Pony Realism. Use the same dataset/images/repeats as my latest workflow with this new preset json. Train for at least 12 epochs, and use EulerA/LCM for the facedetailer/inpainting step to get better face likeness.
Original workflow article can be found here Pony Realism LoRa Training & Preset | Civitai

UPDATE 12: Started training models for PONY XL. The prompt adherence is amazing. The face likeness is not as good as SD1.5 but not terrible either. This Emma Watson model turned out pretty well IMO. Emma Watson / Hermione Granger - [Pony, SD1.5] - Pony | Stable Diffusion LoRA | Civitai
Looking forward to your feedback, SDXL training is new to me and hope to improve the workflow with more iterations.

UPDATE 11: Consolidated guide here - Celebrity LoRA Training Guide (Consolidated) - SD 1.5 | Civitai

UPDATE 10: Quick Update: Turns out you can get away with less than 20 images for training.
Refer to the Claire Forlani, Tara Summers models for the results.
Looking forward to your results, please reach out with your feedback and help improve this workflow.

UPDATE 9: A common occurrence in most of my models is that it performs really well in closeups but other shots leave much to be desired. After some experimentation, I realized that my face dataset images were all extreme closeups. In my latest Angourie Rice model, I changed it up to include a bit more zoomed out images for the face dataset (including the neck and a bit of shoulder area). This resulted in much better likeness even without ADetailer. Please refer to the last comment in this article for my process on choosing repeats/image counts. Recently, @Ehtz11 has reached out claiming that Prodigy optimizer works really well, I am currently working on trying this out.

UPDATE 8: Some changes to the latest method of training,
- going with 25 face images (25 repeats) and 15-20 body images (16 repeats)
- this change results in much better likeness.
- You will have to test the models; typically 6th-10th epoch all achieve likeness to various degrees. Use X/Y/Z and pick the best one.
This change was based on @Lucifael's feedback that likeness was lost in anything but closeups.
Face Detailer/ADetailer is still required, but with the increased count - body likeness will be much better.

UPDATE 7: To include a specific style of clothing/character i.e Margot Robbie as Harley Quinn OR Katheryn Winnick as Lagertha - while also maintaining general likeness, do the following
- Out of 20 face images, include 3-4 images of that character
- Out of 10 body images, include 2-3 images of the character
- After normal WD14 captioning, add a character tag to only those images - like l4g3rth4 or h4rl3yq while keeping the rest of the tags as is
- This will allow the model to generate that character's likeness only when the instance tag is used in the prompt.
Here are a couple of examples with this approach. Open for feedback/suggestions on improvements.
Katheryn Winnick / Lagertha - v1.0 | Stable Diffusion LoRA | Civitai
Margot Robbie / Harley Quinn - v1.0 | Stable Diffusion LoRA | Civitai

UPDATE 6: Found a faster/smaller method to train - check my Billie Eilish model to compare the results of this new method,
- using WD14 captioning
- going with 20 face images (15 repeats) and 10 body images (10 repeats)
- this workflow results in a much smaller LoRA and trains in under 25 minutes.
- 10 epochs, 2 batch size, network 32/32, no regularization
Have attached a new json file for the above workflow.
Looking forward to your feedback on the original vs new workflow.

UPDATE 5: For all my latest LoRAs, I am doing the following,
- skipping the captioning altogether
- sticking with 30 face images (40 repeats) and 25 body images (20 repeats)
- load the json file and train as is (with directories changed, ofc)
The models trained with these settings are far better than the earlier ones where I had more images/repeats.
You can also get away with less images (check out my Alluux model), just make sure to increase the repeats to reach at least 1500 steps total with body:face ratio 1:3.

IMPORTANT: When selecting images for training, try and avoid repeats of the same dress/hairstyle/lighting. Since we are training without captions, it will overfit to include that feature in every generation.

UPDATE 4: Latest kohya_ss is broken with this workflow. Issue is listed here.
AttributeError: PIECEWISE_CONSTANT · Issue #770 · TimDettmers/bitsandbytes · GitHub
Works with other schedulers but the results aren't that great. Have the dataset ready for a few models but stuck because of this. Anybody have a solution?
If anyone is facing this issue, I bypassed it by just commenting the following lines in train_util.py
if name == SchedulerType.PIECEWISE_CONSTANT:

return schedule_func(optimizer, **lr_scheduler_kwargs) # step_rules and last_epoch are given as kwargs

UPDATE 3: Released a new model that was trained with captions. The only things I've changed is adding .txt to the caption extension in the GUI and changed alpha to 64.
Training time and results are pretty close.
Turns out prompt adherence is better when training with prompts and it reaches likeness faster.
Do check out the Saori Hara - v1.0 trained with caption | Stable Diffusion LoRA | Civitai
Let me know which version you guys prefer.

UPDATE 2: Still working on finding the best settings for training with captions, but nothing looks as good as training without. I have released a couple new models without captions. Will try to get a version trained with captions out soon.
When training without captions, I've also found that limiting my dataset to 30 face and 20 body photos(high quality) and ~1500 steps produces better results(less overfitting ig).

UPDATE: I'm an idiot. This guide has a couple of mistakes.
- The kohya_ss training I was using did not read the captions at all since they were in the wrong format. Please change the caption format to .txt in the GUI for it to consider it.
- When naming the image folder ex. 15_j03d4 you're supposed to add the class name at the end like so 15_j03d4 man .

All my models were trained with these faults baked in, the results aren't bad per se but I will be shifting to a different workflow with these fixed moving forward. I apologize to anyone who followed this guide and ended up with subpar results.

Notes:
- On an RTX 3080, it takes an hour to train a single LoRA.

- ADetailer is required in order to fix images where the face is not the focus.

- This guide is just what I've stumbled onto following various other guides and there is still scope for improvement. (moles, marks on people's faces that seep into the final LoRA)

- Please reach out with any feedback or suggestions to improve this workflow.

1. Image Collection:

Face Images (30-50 images):

Obtain 30-50 images of the person's face in different angles, lighting, hair styles.
Try and avoid low res images.
Crop each face image to a size of 512x512 using presize.io or any other suitable tool.

Body Images (10-30 images):

Collect 10-30 images of the person's body in different angles, lighting, clothing.
Crop each body image to a size of 512x768.

Folder Naming:

Name the parent folder using the format: <Shortened Name> (e.g., Joe Danger - j03d4).

Subfolders:

Create two subfolders within the images folder:
- 16_j03d4 (for repetition of body images, range 16-20).
- 25_j03d4 (for face images, range 25-30).
I aim for 1200-1500 steps for the face and 400-700 steps for the body.
Steps is the number of images X the number in the folder name.

Captioning:

Use kohya_ss and WD14 captioning methods for both face and body images.

Tag Management:

Utilize BooruDatasetTagManager to remove unwanted tags from the images.

Folder Structure:

Create two empty folders within the parent folder:
- model
- logs

2. Directory Configuration:

JSON Configuration:

Update the directories in the provided JSON file with the appropriate paths for images, model, and logs.
These settings are for the RTX 3080 and some settings like precision and optimizer should be changed for other cards.
I don't use buckets as I've found better results that way.
I also use the default checkpoint as I've found the results to be more consistent when used with other models but you could probably get away with other models.

3. Regularization Images:

Download regularization images from GitHub and place them in a folder above the parent directory from here GitHub - Luehrsen/sd_regularization_images: Pre-Rendered Regularization Images fou use with fine-tuning, especially for the current implementation of "Dreambooth".
Pick the specific classification folder you need and place them as shown in the folder structure below.

4. Training Configuration:

Training Epochs and Steps:

Train for 3 epochs - could probably get away with 2.
Aim for 1500-2000 steps during training.

Final Directory Structure:

- j03d4
  - images
    - 16_j03d4 (Body images, range 16-20)
    - 25_j03d4 (Face images, range 25-30)
  - model
  - logs
- regularization_man
  - 1_man

Note:

Ensure that the tools and libraries mentioned (e.g., kohya_ss, BooruDatasetTagManager) are installed and configured appropriately.
Adjust the folder names and paths based on your specific setup.