I get asked a lot about my LoRA training process, so I've decided to put together a detailed description of my workflow with help from AI.
This is the method I use to create my SDXL-based LoRAs on CivitAI.
Character LoRAs:
I use between 50 and 150 screenshots, depending on the number of high-quality images available for the character. Ideally, I aim for around 100 images. These images include a mix of upper body shots, full body shots, and close-ups of the face. I don't crop the screenshots; I've found that keeping the original composition, even if it includes multiple people, often produces better results, especially when training on images with multiple characters present. Starting with the Illustrious model, I began incorporating images from booru sources like Danbooru and Gelbooru so the datasets have more images.
For tagging, I always start with the character's name. I then use WD Tagger to generate additional tags. Here's a key point about tagging: For characters with consistent appearances, I remove specific character property tags like "blue eyes" or "long hair" from the training dataset itself. This is because removing these properties makes them part of the character's trigger word (the character's name). This allows for simpler prompting later – you can often just use the character's name. However, I do include these specific tags during the prompt when I want to ensure those features are present. So, while they aren't in the training tags, they are used when prompting.
However, if the character has a variety of styles and appearances, I take a different approach. In these cases, I only remove the most common tags that are consistently present across all images. I keep the tags that are specific to certain styles or appearances. This allows me to prompt for the different styles more easily. For example, if a character has both a "magical girl" and a "casual" outfit, I would keep tags like "magical girl outfit" and "casual clothes" in the dataset.
This approach allows for flexibility. My LoRAs can be used with or without the specific attribute tags during prompting. While the tags are sometimes removed from the training data (depending on the character's consistency), including them in the prompt always allows for finer control over the character's appearance. For example, even though "black hair" isn't a training tag (in consistent characters), including it in the prompt will still produce a character with black hair. This is why you'll often see detailed prompts in my LoRAs, even though the training dataset itself may or may not have those specific tags, depending on the character's variations.
I use the default CivitAI parameters for training. The only adjustments I make are setting the clip skip to 2 and increasing/decreasing the number of repeats or epochs to maintain the steps within a specific range. To avoid overtraining, I generally keep the total steps under 800. A good range for steps is usually between 650 and 800.
Style LoRAs:
The process for training style LoRAs is similar to character LoRAs, but there are some key differences. Most notably, I don't remove any tags for style LoRAs; I keep all the tags associated with the images. Instead, I simply add the style's trigger word at the beginning of each image's tags in the dataset. The training dataset for style LoRAs is also much more varied. I include images of people, other creatures, objects, and landscapes—anything that exemplifies the style I'm trying to capture. I also don't typically use booru sources for style LoRAs. Finally, the dataset size and training parameters are different. For these LoRAs, I typically use between 200 and 600 images and aim for a total step count between 850 and 1500, with most falling between 1000 and 1200.