Sign In

Celebrity Lora Workflow

13

I use the following steps to create a celebrity LoRA. Posting them here to share and possibly receive some suggestions for improvements.

0.Updates

UPDATE 9: A common occurrence in most of my models is that it performs really well in closeups but other shots leave much to be desired. After some experimentation, I realized that my face dataset images were all extreme closeups. In my latest Angourie Rice model, I changed it up to include a bit more zoomed out images for the face dataset (including the neck and a bit of shoulder area). This resulted in much better likeness even without ADetailer. Please refer to the last comment in this article for my process on choosing repeats/image counts. Recently, @Ehtz11 has reached out claiming that Prodigy optimizer works really well, I am currently working on trying this out.

UPDATE 8: Some changes to the latest method of training,
- going with 25 face images (25 repeats) and 15-20 body images (16 repeats)
- this change results in much better likeness.
- You will have to test the models; typically 6th-10th epoch all achieve likeness to various degrees. Use X/Y/Z and pick the best one.
This change was based on @Lucifael's feedback that likeness was lost in anything but closeups.
Face Detailer/ADetailer is still required, but with the increased count - body likeness will be much better.

UPDATE 7: To include a specific style of clothing/character i.e Margot Robbie as Harley Quinn OR Katheryn Winnick as Lagertha - while also maintaining general likeness, do the following
- Out of 20 face images, include 3-4 images of that character
- Out of 10 body images, include 2-3 images of the character
- After normal WD14 captioning, add a character tag to only those images - like l4g3rth4 or h4rl3yq while keeping the rest of the tags as is
- This will allow the model to generate that character's likeness only when the instance tag is used in the prompt.
Here are a couple of examples with this approach. Open for feedback/suggestions on improvements.
Katheryn Winnick / Lagertha - v1.0 | Stable Diffusion LoRA | Civitai
Margot Robbie / Harley Quinn - v1.0 | Stable Diffusion LoRA | Civitai

UPDATE 6: Found a faster/smaller method to train - check my Billie Eilish model to compare the results of this new method,
- using WD14 captioning
- going with 20 face images (15 repeats) and 10 body images (10 repeats)
- this workflow results in a much smaller LoRA and trains in under 25 minutes.
- 10 epochs, 2 batch size, network 32/32, no regularization
Have attached a new json file for the above workflow.
Looking forward to your feedback on the original vs new workflow.

UPDATE 5: For all my latest LoRAs, I am doing the following,
- skipping the captioning altogether
- sticking with 30 face images (40 repeats) and 25 body images (20 repeats)
- load the json file and train as is (with directories changed, ofc)
The models trained with these settings are far better than the earlier ones where I had more images/repeats.
You can also get away with less images (check out my Alluux model), just make sure to increase the repeats to reach at least 1500 steps total with body:face ratio 1:3.

IMPORTANT: When selecting images for training, try and avoid repeats of the same dress/hairstyle/lighting. Since we are training without captions, it will overfit to include that feature in every generation.

UPDATE 4: Latest kohya_ss is broken with this workflow. Issue is listed here.
AttributeError: PIECEWISE_CONSTANT 路 Issue #770 路 TimDettmers/bitsandbytes 路 GitHub
Works with other schedulers but the results aren't that great. Have the dataset ready for a few models but stuck because of this. Anybody have a solution?
If anyone is facing this issue, I bypassed it by just commenting the following lines in train_util.py
if name == SchedulerType.PIECEWISE_CONSTANT:

return schedule_func(optimizer, **lr_scheduler_kwargs) # step_rules and last_epoch are given as kwargs


UPDATE 3: Released a new model that was trained with captions. The only things I've changed is adding .txt to the caption extension in the GUI and changed alpha to 64.
Training time and results are pretty close.
Turns out prompt adherence is better when training with prompts and it reaches likeness faster.
Do check out the Saori Hara - v1.0 trained with caption | Stable Diffusion LoRA | Civitai
Let me know which version you guys prefer.

UPDATE 2: Still working on finding the best settings for training with captions, but nothing looks as good as training without. I have released a couple new models without captions. Will try to get a version trained with captions out soon.
When training without captions, I've also found that limiting my dataset to 30 face and 20 body photos(high quality) and ~1500 steps produces better results(less overfitting ig).

UPDATE:
I'm an idiot. This guide has a couple of mistakes.
- The kohya_ss training I was using did not read the captions at all since they were in the wrong format. Please change the caption format to .txt in the GUI for it to consider it.
- When naming the image folder ex. 15_j03d4 you're supposed to add the class name at the end like so 15_j03d4 man .

All my models were trained with these faults baked in, the results aren't bad per se but I will be shifting to a different workflow with these fixed moving forward. I apologize to anyone who followed this guide and ended up with subpar results.

Notes:
- On an RTX 3080, it takes an hour to train a single LoRA.

- ADetailer is required in order to fix images where the face is not the focus.

- This guide is just what I've stumbled onto following various other guides and there is still scope for improvement. (moles, marks on people's faces that seep into the final LoRA)

- Please reach out with any feedback or suggestions to improve this workflow.


1. Image Collection:

Face Images (30-50 images):

  1. Obtain 30-50 images of the person's face in different angles, lighting, hair styles.

  2. Try and avoid low res images.

  3. Crop each face image to a size of 512x512 using presize.io or any other suitable tool.

Body Images (10-30 images):

  1. Collect 10-30 images of the person's body in different angles, lighting, clothing.

  2. Crop each body image to a size of 512x768.

Folder Naming:

  1. Name the parent folder using the format: <Shortened Name> (e.g., Joe Danger - j03d4).

Subfolders:

  1. Create two subfolders within the images folder:

    • 16_j03d4 (for repetition of body images, range 16-20).

    • 25_j03d4 (for face images, range 25-30).

  2. I aim for 1200-1500 steps for the face and 400-700 steps for the body.

  3. Steps is the number of images X the number in the folder name.

Captioning:

  1. Use kohya_ss and WD14 captioning methods for both face and body images.

Tag Management:

  1. Utilize BooruDatasetTagManager to remove unwanted tags from the images.

Folder Structure:

  1. Create two empty folders within the parent folder:

    • model

    • logs

2. Directory Configuration:

JSON Configuration:

  1. Update the directories in the provided JSON file with the appropriate paths for images, model, and logs.

  2. These settings are for the RTX 3080 and some settings like precision and optimizer should be changed for other cards.

  3. I don't use buckets as I've found better results that way.

  4. I also use the default checkpoint as I've found the results to be more consistent when used with other models but you could probably get away with other models.

3. Regularization Images:

  1. Download regularization images from GitHub and place them in a folder above the parent directory from here GitHub - Luehrsen/sd_regularization_images: Pre-Rendered Regularization Images fou use with fine-tuning, especially for the current implementation of "Dreambooth".

  2. Pick the specific classification folder you need and place them as shown in the folder structure below.

4. Training Configuration:

Training Epochs and Steps:

  1. Train for 3 epochs - could probably get away with 2.

  2. Aim for 1500-2000 steps during training.

Final Directory Structure:

- j03d4
  - images
    - 16_j03d4 (Body images, range 16-20)
    - 25_j03d4 (Face images, range 25-30)
  - model
  - logs
- regularization_man
  - 1_man

Note:

  • Ensure that the tools and libraries mentioned (e.g., kohya_ss, BooruDatasetTagManager) are installed and configured appropriately.

  • Adjust the folder names and paths based on your specific setup.

149

Comments