This is my first article on Civitai. I hope you find it helpful.
I won't go over all the installation stuff, because there are plenty of videos on YT that tell you how to do that part. Here's one:
Now, on to the good stuff...
Once you have KohyaSS up and running, let's concentrate on one of THE most important things: a good dataset! Due to the high resolution format of SDXL, you will need to get sharp, clear pics for your training dataset.
From my experience, 10-20 good face shots and a mix of 20 half body and full body pics works pretty good. Be sure to collect a good mix of angles and expressions to avoid an inflexible character. It's also a good idea to tag glasses and jewelry to avoid them showing in every render.
If your subject has different hair colors/styles, tag it.
Note: Avoid pics with hands/fingers near the face of your subject. Trust me.
If there are any kind of text or watermarks on the pics, it's best to remove it with photo editing software.
For a good training session, make sure your dataset is minimum 1024x1024. Kohya will do bucketing, but low resolution pics will screw up your training.
Also, if you have too many pics with the same outfit, the model will show bias towards that outfit. The same goes for background scenery. Variety is the spice of training!
Remember, 1 or 2 bad pics can ruin your training. If you notice that your samples are funky after 600 steps, stop the training and start over after you delete the problem pics.
I use Topaz Gigapixel to upsize pics when needed due to the insanely good results I get from it, and the ability to sharpen or soften pics. There are some pretty good free upscalers out there if you can't get Topaz. Do your research.
Here is a few pieces of free/paid software that will help you:
Once you have your dataset sorted and fixed up, it's time to start tagging. I have used a few different taggers, and found the one included in Kohya to be the best. My personal preference of course. I use the WD14 moat tagger to do booru style tagging.
Run the tagger, and then load the tagged dataset into Booru Dataset Tag Manager and delete the bad tags and add tags it missed. Some people say it's better to tag by hand, and by all means, if you have that kind of patience, do your thang! I have found that my method is best to avoid a three hour tagging session!
Make sure and put your unique token and classification tag first when running the tagger. It is a good idea to make a text document of undesired tags, so you don't have to type all of that in each time you use it. Here is a small set of undesired tags you can start with:
These are just a few of the ones I have used recently. Modify to suit your needs.
So now you have your spiffiest pics and extra super duper tagging done, let's move on to the training settings.
First of all, I have very little knowledge or understanding of the science behind what I am about to tell you. (Maybe a little) I watch videos, I run tests, I get results. Hopefully, my eff-ups will now help you get good results!
Repeats: For a dataset of less than 75 pics, I use 10 repeats. Folder name: 10_K4thy, woman
6 to 10 epochs works pretty good, saving every epoch, and just delete the first 3 epochs after training as they are most likely useless.
Before I go any further, if you don't have at least an RTX3060 with 12Gb v-ram, stop here and go sell a kidney and get a better GPU. SDXL training will NOT work on a GPU with less than 12gb.
Okay, moving on...
Base checkpoint: SDXL 1.0 (it's best to use the base, as it will work on the majority of SDXL based merges)
save precision: bf16
Train batch size: 1 (don't argue, just pick 1 unless you have a super computer with a $5000 GPU!)
Epochs: 30-50 training pics, choose 10.
50-100 training pics, use 4, 5, or 6. the goal is to train for approximately 4000 to 5000 steps.
Cache latents: true
Cache latents to disk: true
LR scheduler: constant
Optimizer: adafactor (stop! Don't try whatever so and so said to use! This is the only one that has worked for me!)
Optimizer extra arguments: "scale_parameter=False relative_step=False warmup_init=False" (remove quotes)
Learning rate: 0.0003
LR warmup: 0
Max resolution: 1024,1024
Enable buckets: true
Max bucket size: 1024
No Half Vae: True (decreases V-ram usage)
Text Encoder learning rate: 0.0003
Unet learning rate: 0.0003
Network Rank (Dimension): 64, or 32 if you're getting Nans errors.
Network Alpha: 8, or half of Rank. (it's your life, do what you want)
Save every N steps: I use 500, sometimes 1000, as I'm saving every epoch anyway.
Gradient checkpointing: true
Save training state: true
Sample every n steps: 100 or whatever floats your rubber ducky.
In the sample prompt box, use a decent prompt that will give you the type of output you want.
Example:
masterpiece, best quality,a photo of S4br1n4, woman, face focus, medium breasts, long hair, outdoors, --n futanari,deformed,deformed eyes,ugly face,cartoon, --w 1024 --h 1024
It's not a bad idea to put two prompts here, Imo.
If I haven't pointed out a setting, leave it at default!
Hit the start training button and be prepared to wait 3 to 5 hrs for it to complete.
Btw, don't use any other programs while the training is running! Kohya is slamming your resources, and anything you run will slow down the training! I am able to watch YouTube vids, but I have an I-9 processor and 64gb system ram. Better not to chance it.
My latest Loras: