Sign In

How I locally create my own SDXL LoRAs (Updated 31 March 2025)

17
How I locally create my own SDXL LoRAs (Updated 31 March 2025)

Local LoRA Training Guide

So I was asked recently about my LoRA training and I thought I'd share with you.

Dataset - Images:

Okay so I tend to stick with 30 images. I go for the highest resolution where possible. Because this is SDXL training, the minimum resolution is 1024 x 1024 and I tend to keep well above it.

It's worked well for me so far because the models I've uploaded are for more modern 'celebrities' with more high resolution images readily avaiable on the web as opposed to 'celebrities' from the 90s and 00s where you may still have a lot of pictures but the picture resolution may not be as great.

Dataset - Captioning

I use WD14 captioning and I love the Kohya-ss tool and it's inbuilt caption tool. I use 3 particular captioning models. Why 3 you may ask, well that's because I'm slightly OCD about making sure I have as much descriptive tags as possible.

I use the "Append TAGs" tick feature in Kohya-ss. This means new tags are appended. The tool doesn't add duplicate tags found with a different model so that helps a lot.

The WD14 captioning models I use are the SmilingWolf models (all found natively from the drop down list in Kohya-ss):

  1. wd-v1-4-convnextv2-tagger-v2

  2. wd-v1-4-vit-tagger-v2

  3. wd-convnext-tagger-v3

When it comes to cleaning up tags which are not correct (it does happen), I use Booru Dataset Tag Manager. It's a nifty little tool (URL below): https://github.com/starik222/BooruDatasetTagManager

The interface take a little getting used to. Alternatively, you can open the caption file in Notepad and just delete or add any tags that way too.

Dataset - Training Parameters

FYI - Having the advantage of a fairly newish PC rig and an RTX4090, I train locally.

UPDATE: I also train using the CivitAI trainer. See separate guide.

When it comes to number of repeats, I tend to go with 20. So my dataset folder containing my images with their respective caption text files looks like "20_ohwx woman"

In terms of epoch, I used to set the number of epochs to 10 but found (somewhat consistently) that it was the output from epoch 5 that always worked well. Anything after 5 was too overtrained so for me, epoch 5 tended to be the sweet spot.

Given that logic, I didn't see the point of waiting for my training to give me 10 lora files if it's usually the 5th file which is the sweet spot. Plus as a result I halve my training time. I found with my RTX4090 and the dataset as described above, I can output a Lora file in approx 22 minutes.

Now rather than list down all my parameters, I've uploaded with this article; my Kohya-ss LoRA configuration file. Feel free to download and use it. This file should work whether you're training against SDXL or PonyXL.

UPDATE: I've also added a new v2 SDXL json file with updated parameters from what I've learned from training using the CivitAI lora trainer.

Why not try both json configs and compare the difference?

The file has a batch size of 1 configured so anyone looking to train locally should be able to just amend the checkpoint and dataset folder paths and go straight to the training. The file is configured for 10 epochs but I set it to stop at 5 on purpose.

Incidentally, if you've got a good GPU and the VRAM to go with it, feel free to kick up the batch size. I normally have it set to 4 but I lowered it to 1 in the file so anyone with e.g. RTX3090 should be able to use the training parameters as they are.

I think this should be enough to get you going. Have a play and feel free to share your feedback. You can DM me or alternatively, leave feedback in the comments section.

Happy training.

17

Comments