What to Expect When Training a Concept LoRA? Example of my LoRA training procress

There are lots of guides and options you can follow to make a LoRA

Some people will take it seriously and take care the process very carefully, some will be skeptical on the dataset, some will follow blindly other training parameters, some will make good, some will make bad LoRA.

With this article, I'll show you my currently training progress, please make sure do not follow blindly with my settings, please adjust if need, please learn more by yourself.

I'll talk about training LoRA with NoobAI checkpoint in this article, but this will be the same things for other anime based checkpoint that training on Danbooru tags.

I myself like doing concepts LoRA for a whole scene, so the advice I provide will be weird and different from other people.

To begin with this article, first you have to understand what is LoRA, how to choose a checkpoint to train, how to tag, how to adjust training parameters.

Please read some articles before starting read this one, cause I'm not going to explain basic things. I'll go completely by my opinion and what I'm currently doing.

https://arcenciel.io/articles/1

https://arcenciel.io/articles/3

https://arcenciel.io/articles/9

Begin

Training a LoRA working for yourself is easy, but training to get a consistent and compatible LoRA is way harder.

The same as training character LoRA, concept LoRA need the same tagging process, even the same training parameters. The different of this is that it's harder to figure out if it has errors or not in some cases.

I'll write down some more articles in the future, a series of my training progress for some LoRAs I post on site. I will also provide something that you will counter when training LoRA in another series of articles. Stay tuned!

I'm going to use Google Colab to train LoRA . Not onsite CivitAI Trainer or local training

Here are steps I'm following for training a LoRA:

Dataset Gathering: Try you best to gather images in range of 5-50 images, higher may be better, but with these SDXL checkpoints, you don't have to have too much images to train a LoRA
Image Processing: Get rid of noises, upscale your image into the supported resolutions of the training checpoints. In this case maximum resolutions to 4096, ex: 1024x1024, 832x1216. Make sure to remove text, watermark, signature if applicable
Tagging: Most important part. 70% successful rate is based on this part. I myself spend most of the time on this part. Blame your self and the way you tag, don't blame the model not understand it
Settings the Trainers: Grab these basic tomls from this discord sever Arc en Ciel, use it and adjust it on your own
Training: Easiest part, only some clicks to start training. Take about 5-30 minutes for me to cook a LoRA now. LoRA Easy Training Colab
Testing: Worst part, this is where you start to be depressed. Test it with different characters, then test it with character Lora, then test it with style (optional) to see how compatible your LoRA is
Retraining if something went wrong or just give up: I'll be here most of the times

Below is more general information about each steps, to be more specific detail on each step, please scroll down to the REAL CASE SAMPLE

1/ Dataset Gathering:

First of all, Danbooru or any kinds of Booru sites like GelBooru and SafeBooru, will be your friend, will be the tools you have to use most of the times for gathering dataset and tagging the dataset

You can go through the internet and find the images for your dataset, do whatever you want, but make sure to follow the rules and laws! :D

Depend on what kinds of things you want to train, a character, a pose, pieces of clothes, background, etc, you may have different things to note in mind when getting a dataset.

You should know that if something has definition, then it can be considered as a concept. So you have many kinds of concepts, this let to each concept you want to train as a unique training progress. They may have same training parameters, but the way you get the image, process the image, and tag the image will affect the whole training progress for each concept.
Here something you want to keep in minds when getting dataset:

Pose: I prefer to choose image with only one angle only, I don't recommend making a pose LoRA with multiple angles.
Clothes: Get as much that kinds of clothes for different characters as possible, different position maybe good too, may be not good for clothes writing
Scene reference, background: Avoid getting images with sketch/monochrome or same art style, if you don't want to go back and forth for tagging and retraining
Multiple people: This is bias and racist, but I prefer human or half-breed with less fur as much as possible, no alien, no objectification, avoid colored skin, avoid characters drawing in sketch, unfortunately
Expression: Same as clothes, but try to get image with or without open mouth. EX: smug face with closed mouth, then avoid all open mouth smug face.
Multiple concepts: Good luck, go figure them out by yourself
Image with text: Same patterns, same text positions, same image ratio (if applicable, make them same resolutions). As long as they have same text position, 5-10 images are good enough to train
If character, then go read other articles

Try to gather image between 5-50 images, I find myself have a better success on training a LoRA with 10-30 images. For me, sweet spot may be somewhere between 10-12 or 28-30

You can gather more image if you want, but with those checkpoints from currently era, they don't require you to have a large dataset, just get enough images and you are good to go.

2/ Image Processing

You don't want the model learn something unnecessary, crop out all the unnecessary stuff out

Upscale image, get rid of noise as much as possible. My favorite upscale is 4x-AnimeSharp and RealESRGAN_x4plus_anime_6B.

Before upscaling the image, try your best to remove text, watermark, signature if applicable.

If you have more time, go search for color correction tools and use it for your dataset. Some people do this step when training with anime screencap, but I don't train with anime, so I don't do this :D

DO NOT REMOVE BACKGROUND, DO NOT USE TRANSPARENT BACKGROUND

Make sure the resolutions of each image are in the training range for the checkpoint, in this case between 2048 and 4096. Ex: 1024x1024, 1216x832, etc

I tend to put images with similar resolutions into 1 folder for upscaling progress.

If the image is low resolution, you may want to upscale it, and if the upscaler makes the quality worse, then the best way is use the original or just remove it from your dataset.

3/ Tagging:

THIS IS THE MOST IMPORTANT PART. Please be careful and take more times to this part. Unless you want to retrain something multiple times

I'm using this dataset processor from Jelosus. You should go to Github, read the instruction, clone it, use it. https://github.com/Jelosus2/DatasetEditor

Use whatever the tags the tag generator spit out, and remove wrong tags, add missing tags, add triggered words if need, and prune the tags (optional).

Be specific with tags, but not too specific. Always think about "what part will the model can understand it easily, what part will the model have a hard time to learn". Tag the part that you think the model struggle to learn more specific. Tag other part general or remove it if you are confident the model can understand them itself.

I use the tags from tag generator, then add some missing clothes/background/action tags on, and I remove some of tags like "virtual youtuber, solo, solo, any characters baked in tags (ex: gotoh hitori), etc".

Although it's optional to remove those tags, but please just remove them, you won't know how bad they are going to act up in some cases. There was a time I trained a LoRA that have only 4 images of "gotoh hitori" out of 30 images from my dataset, and that tag "gotoh hitori", she make most of my gens into "shaded face" :(

I specific with the actions, expressions, and background.

Trigger word is optional, unless you make a multiple concepts LoRA, but it's also optional, too :D

Always keep in mind: "Blame the tag, don't blame the model" You can blame yourself for not tagging a correct tag, but don't blame the model not understand what you want to make, unless it's too complicated or non-logical ridiculous situation that it's not even exist even in other people mind.

If you ask those people who make decent LoRAs, they can yap about tagging process for couples hours, and most of the time you won't believe them until you make that LoRA by yourself. So just try and figure what a good tag by yourself, learn from failing, not from success.

PLEASE NOTE THAT THIS IS BASED ON MY OBSERVATION AND OPINION, YOU NEED TO TRY IT FIRST BEFORE BELIEVING IN ME!

4/Setting the trainer (prepare the oven):

For those who don't have a decent GPU to train a LoRA, go spend 4-6 hours reading multiple articles and guides about training LoRA using Google Colab. It's worth to learn it instead of spending money and time into some online training service that limit the training settings.

From the time you are able to make a LoRA using Google Colab, you don't want to comeback to any online training services, unless you don't have times to wait for the script to complete the task

I myself currently using this colab from Jelosus. https://github.com/Jelosus2/Lora_Easy_Training_Colab

And the backend script: https://github.com/Jelosus2/LoRA_Easy_Training_Scripts,

That script is a fork of: https://github.com/derrian-distro/LoRA_Easy_Training_scripts_Backend

Read this article to understand how to use it. https://civitai.com/articles/4409/almost-local-lora-training-guide

So grab the toml files from Arc en Ciel discord sever: https://discord.com/channels/1113893773714399392/1188976775271813200, adjust it to your training dataset, adjust other kinds of settings if you want

Go to the sever faq, then read those instructions, it may help you about training progress.

From the times you check on the Training Optimizer and Scheduler, you will see something way different than those Optimizer that's available on CivitAI onsite trainer.

With CAME + REX, you have a great chance of getting what you want with less steps, and save more times for the training progress.

If a LoRA work well with dataset of 20 images, batch size of 4, 10 repeats,10 epochs, which is 20/4*10*10 = 500 steps on AdamW8Bit + cosine with restart, then you can make that LoRA with CAME + rex annealing warm restart (RAWR) for only 50-100 steps. That shows how strong CAME is.

Imagine using AdamW8Bit or AdaFactor to train a LoRA and you have to wait 0.5-2 hours, what if it's not success? Are you willing to retrain the LoRA more than 3 times? I don't have that patient to wait that many hours.

Only take me 5 minutes to download the resources, and 3-20 minutes to train a LoRA using CAME.

When you complete reading and following the instruction how to use google colab and easy training script, and you want to adjust the toml file, you may want to change some of the settings:

Gradient Accumulation
Batch size
Learning Rate (LR)
Noise offset
Weight Decay

Think about how will you train a LoRA, some parameters that I'm currently training with CAME + REX:

Batch size: 2-4
Repeats: 1-3 (depend on the number of images from dataset, go look on the Arc en Ciel sever)
UNET LR: 4e-5 - 8e-5
TE LR: 1e-6 - 7e-6
Epochs: 5-20
weight decay: 0.04 - 0.08
Min SNR: 5
Network Dimension: 8-16
Network Alpha: 4-12
Noise offset: 0.0357
Pyramid Noise Iterations: 5
Pyramid Noise Discount: 0.25

As my understanding about batch size, with batch size of 2, it means each step of training, you let the model look at maximum of 2 images from your dataset at the same times and learn from those images. It's also depend on the bucketing, so it not always 2 images at the same times, may be less. With high batch size, this help the model understand the similar patterns better.

Using CAME will burn your LoRA super fast if training on high repeats, so go play around with 1-3 repeats. Unless you want to make it learn faster or your dataset is below 10 images, go with repeat 4-5, it should be good, according to my test :D

I SUGGEST FIGURE THIS STEP BY YOUSELF AND TRY ONE BY ONE UNTIL YOU FIND OUT WHICH ONE IS SUITABLE FOR YOU

Since each LoRA training time only take me minutes, so I have more confident to train multiple times with all kinds of different settings. So you will see my currently LoRAs don't have the same training parameters :D

5/ Training:

Read this article to see how to run the script. https://civitai.com/articles/4409/almost-local-lora-training-guide

For me, it will take 4m30s to complete the first step, 5-10s to confirm to access into Drive Folder, 10-20s for downloading checkpoint and VAE, 10-30s to start the training.

Open Google Colab, go to the script, download resources,

You won't need to repeat those setups until you reopening the Google Colab

Remember to put dataset on Drive before you start training :D

Choose the checkpoint and VAE you train on

Upload dataset, tagging, skip this one if you have already done it

Run, kill the cloudeflared, start training

I'm usually training with a small amount of images between 10-20 images, with CAME optimizer, this takes about 5-15 minutes.

I recommend buying Google Colab Pro, that's the best option for training with higher VRAM GPU, and it way worth than buying BUZZ to train LoRA on CivitAI.

ABOUT PUTTING DATASET ON CIVITAI:

Won't allow some certain tags on dataset.

Automatically check your tags from dataset, sometimes it refuse to train due to some privacy violated that I'm surely I did not even violate at all!

Cost too much to train on NOOB checkpoint.

Waiting times for the process sometimes are too long.

REAL CASE TRAINING:

Sabai Momoi Meme

Dataset Gathering:

I'm gathering images from Safebooru for this one, sometimes I need to relax my eyes from those crazy nsfw things on Danbooru and Gelbooru.

This one, I gathered 6 images of this concept.

Image Processing:

I cropped the unnecessary part out of some images:

Now, I upscale the image, I put those that have similar resolutions into one folder

Upscale them:

I saved them into png file. It's better to train with PNG file. PNG saves more detail than JPEG or JPG. Because PNG uses lossless compression, meaning it retains all the original image data.

According what I read about PNG file, it's a best option for training image.

Tagging:

Open the tag processor, then you go to Generate Tags

Select your input and output folder for the txt files. Make sure to choose the suitable tagger models. My favorite one is WDv3Large at 0.25 threshold

Now you go to Process Tags:

Check those 2 boxes, it's optional. It's better to rename the images, because sometimes if you keep the names with some symbols or too long, you can't extract them back if you compress it to zip file.

After that, go through each image, delete wrong tags, unnecessary tags, and characters name tags (optional)

I'll delete them out

Keep doing them for all the images you have. While doing it, think about "what part will the model can understand it easily, what part will the model have a hard time to learn".

If you need to add more tags, then tag the part that you think the model struggle to learn more specific. Tag other part general or remove it if you are confident the model can understand them itself.

For more than 2 characters in scene, it's better not to tag all the color tags for their hair and eyes. Just believe the tag generator, just go with it.

I'll show you 3 different versions of tagging and how it effect the training.

1st version will be the one I'm showing here

2nd will be the one I specific about every hair/eyes color tags

3rd will be not having an action tag, which is "pointing at viewer"

Trigger word is optional, unless you make a multiple concepts LoRA, but it's also optional, too :D

Now you already complete with remove and add in some tags, it's better to bring all the common tags into the beginning, and add trigger word if need.

Go back to Process Tags

Here all of the images have these common things: "pointing at viewer, open mouth", so those tags will be my trigger words, I put them on top.

Wait! Before processing tag, go uncheck these boxes, because sometimes, the tags you add in may be removed if you ask the processor to apply redundancy removal to the tag.

After you complete these steps, you now can go and prepare the trainer.

Setting the Trainer:

Well, I'm using Easy Training Script and Easy Training Colab for training my LoRA currently

Colab from Jelosus. https://github.com/Jelosus2/Lora_Easy_Training_Colab

And the backend script: https://github.com/Jelosus2/LoRA_Easy_Training_Scripts

For this concept, I'm using this toml. Open it by your own

Okay, let's start:

First, open your Google Drive,

Put your LoRA dataset on your Google Drive, I myself use this path: My Drive/Loras/LoRA_name/dataset

Upload your dataset into dataset folder:

Now, open the Google Colab, https://colab.research.google.com/github/Jelosus2/Lora_Easy_Training_Colab/blob/main/Lora_Easy_Training_Colab.ipynb

For those who haven't trained any LoRA with this Colab before, I recommend you doing this step before running the Colab

Go to Arc en Ciel discord sever, read the faq, the training guides, training parameters, and grab the tomls that suitable to you. https://discord.com/channels/1113893773714399392/1188976775271813200

Git clone this back end script for the colab: https://github.com/Jelosus2/LoRA_Easy_Training_Scripts

Read the instruction, run it, and it will be like this

Go to File/Load Toml, then choose the toml file you just download

Now, just keep it like that, and go back to the google colab

Connect to the GPU, click on this part

Click on Manage Resource, check on what kind of GPU you are connecting to, if you are using normal colab, the default GPU should be the T4.

I'm using the Colab Pro, so I change my GPU to L4 so I can train it with bf16 and faster.

Okay, let's run the first step to download the trainer

Usually, it will take 4m30s to installing the trainer. Okay with that amount of time of waiting, I do something else

While it's still running, go to 2nd step, put the locations of your dataset you save on Drive.

Go to 3rd step, choose the Checkpoint and VAE you are going to train, I'm going to use NOOB-EPS1.1 as the base model

Should be good now, open back the backend script, adjust the training parameters if need. Here is the toml for this one. If you are using T4 GPU, on General Args, uncheck the BF16, check the FP16

https://drive.google.com/file/d/1OBlnQOQdpqPAh1k4sEEDRPvKW0DkIrBP/view?usp=sharing

After you adjust the parameters, come back and check if it finish download the trainer, then proceed to next step

Now connect to your Google Drive, make sure to read and confirm what they are saying -> continue -> continue

After it successfully connect to Google Drive, go run the step 3

After finish running step 3, go directly to step 5 since you already set up the dataset

Run the first 2 cells, copy the dataset paths to your script, then copy the cloudflared links into your script

Click Start Training, wait for the cloudflared killed. Then run the last cell

The cloudeflared link was different because I restart the cell, so don't mind the previous link :D

Now you should run the last cell to start the training

Training:

We did went through a whole process of setting up the trainer and start training, now look at what they have during the training

You can see some of the training parameters and information here.

And look at the time we finish the last epoch: 8m38s. Super fast :D

Since the training time is short, I can make different versions of this LoRA with different kinds of parameters I want. I have more times to test, to compare, and see which one is suitable for me

Also please don't mind the average loss, that thing is there, but it's not important, it only makes people confuse sometimes, unless you see your loss suddenly go double or triple, even you train with min snr :D

As I mention before, I will provide 3 different versions of this LoRA.

They are all trained on same training parameters. Only different in the tagging process.

I'll show them below:

It did not show full tags, but I did save the txt files, I attached them in the article, go check it by yourself if you want.

Testing:

Well, fun part with stress, here I come :D

I'm going to download all the versions I saved to my Google Drive to make and XYZ grid test

I'm going to use this prompt to test for all versions: pointing at viewer, open mouth, blush stickers, 2girls, white background, simple background, upper body, hatsune miku, kasane teto, looking at viewer,

I'll test with 2 seeds: 1 and 2

In case someone new to the XYZ plot, here how to do it:

Go to your webui

On the webui, scroll down to the end, choose X/Y/Z plot

Now choose the Prompt S/R for the X type

Choose the Seed for the Y type

Put each LoRA into the X values, make sure to have "," for each value

Put Seed 1 and 2 on Y values

Here are the results of 3 version:

1/ Normal one, believe in the dataset processor, partially tag character hair/eyes color, with actions tag "pointing at viewer"

2/ Partially tag character hair/eyes color, don't have action tag "pointing at viewer"

3/ Put all the tags character hair/eyes color, with action tag "pointing at viewer"

As you can see, they are all different, even we have the same training parameters.

All of them was trained on 6 images, batch size of 2, accumulation 2, repeats 4, same epoches, same LR and other parameters.

With proper tags, you can make the model learn faster, and less errors.

If you don't tag an action properly, model will have hard times to understand it, and will fucked up the anatomy sometimes, or refuse to learn the concept. Or will learn very slow, and you will need to bake it with more epochs

You can see the longer it bake, the more it burn and give some unnecessary details.

Currently, I see epoch 14-20 working fine. I'll test more with them

I'm gonna use random seeds, with 2 different characters:

Look alright, but let try using Regional Prompter, an extension help you to control each region of image.

This will help me to control the position, action, expression of each character. Here is the result

From those testing, I decide to pick epoch 18 as the final product. Cherry picking is a thing you have to deal with when testing a LoRA, maybe it will good, or bad, we don't know

Let start testing LoRA with more characters

Quite alright, hands are weird :D

Now coming to the best part to see if there is something wrong with the LoRA or not. Using with other LoRAs :D

I'll test with 1 character LoRA first. I'm going to use this LoRA: https://civitai.com/models/890136/tsunomaki-watame-9-outfits-or-hololive-or-noobai-xl-eps-v11-illustriousxl?modelVersionId=1711100

You see the color and style is different right? It because I remove the tag "blush stickers" out of my prompt :D

Here is the one with "blush stickers"

But just remind that, LoRA is fried. This is what that prompt without the LoRA look like

This is what you will encounter a lot when training a LoRA. It will be fried, burned, overcooked intentionally or unintentionally.

For me, the LoRA has some style burned in, but I accept it, at least it not having the bad colors like the one I post on CivitAI here. https://civitai.com/models/1790780/saiba-momoi-meme-racist-momoi

The one above has something wrong with the yellow color. Don't ask why detail is more accurate, it's because I inpaint that :D.

No more that kinds of yellow on this current LoRA :D

The 2 character LoRAs I used above:

https://civitai.com/models/1083730/irys-8-outfits-or-hololive-or-noobai-xl-eps-v11-illustriousxl?modelVersionId=1745729

https://civitai.com/models/1560692/hakos-baelz-9-outfits-or-hololive-or-noobai-xl-eps-v11?modelVersionId=1766079

Some more images for this LoRA:

LoRA use:

https://civitai.com/models/882989/doro-3-illustrious-xl?modelVersionId=988415

https://civitai.com/models/1467265/dorothy-nikke-sdxl-lora-illustrious-or-4-outfits?modelVersionId=1659482

https://civitai.com/models/963370/cecilia-immergreen-2-outfits-or-hololive-or-illustriousxl?modelVersionId=1312257

https://civitai.com/models/1173687/gigi-murin-2-outfits-or-hololive-or-illustriousxl?modelVersionId=1320654

FINAL THOUGHT

That's just one LoRA :D

And each concept LoRA, will have its own problem, some are worse to deal with, some we can handle it easy, some will need different method.

But to make sure, we use AI for our self purpose, even it's good or bad, researching or gooning, at the end of the day, it's all about using for ourself.

So, just have fun, you can make whatever you want for yourself. But if you want to share the LoRA, try your best to make it more compatible for people to use. :D