santa hat
deerdeer nosedeer glow
Sign In

How do I train with SDXL? Should I do a checkpoint or LoRA?

I'm a noob and I have a lot of art that I like. What is the best way to train my own style with SDXL/1024x1024? Can I do an SDXL checkpoint or should I train an SDXL LoRA? Regardless, what should I download or what way of training do you think is the best?

1 Answer

I would start with a LoRA to be honest. It's a lot faster to train both for your time as well as GPU time. Download kohya_ss https://github.com/bmaltais/kohya_ss and run through its setup. Look on the Discord for help with training, there's a channel called training-logs where some folks share what they're doing including settings files for how they are training LoRAs.

You'll need to find good subject matter for a LoRA and once you have the idea, find some images. Don't use low resolution images. Don't worry about resizing them all to be the same size either - the training tools account for the different dimensions which is convenient. You'll then want to caption those images, I find using Booru tag manager convenient https://github.com/starik222/BooruDatasetTagManager

I thought this tutorial was a decent place to start: https://www.canva.com/design/DAFcn1l_ulE/view#1 ...it doesn't go toodeep into the technical aspects, it just gives you a good overview for how it works and how to caption/tag your training images. People also use WD 1.4 tagger (I use ComfyUI and there's a custom node for it, but Automatic1111 has a plugin for it too and that's probably the more popular place to run it) to automatically get tags for their images. So you don't necessarily need to be typing all the captions by hand.

There's many settings that you can tweak. Don't go crazy over them. I think one of the most important things to keep in mind is the network rank settings. The network dimension and network alpha. Keep them low. Like 32 or less. I think was even using 16 at one point. The higher these values, the larger the LoRA file size will be. The longer it'll take to train, etc. Also the batch size affects GPU memory usage.

I'm usually training a LoRA for about an hour on a 3090 Ti.

There's a lot of confusing info out there and some of it also dated. Keep in mind most people are experimenting with how they generate their LoRAs and there's no real one size fits all shirt here. So it can take quite a bit of time of trial and error. Once you find the settings that work for you, you can just save them and load them again in kohya_ss and change the input/output directories to train something else in the future. So once you kinda get going, it's not so confusing or daunting.

You will be surprised what you can get with a data set of around 30 images and about an hour of training.

Your answer