Hello everyone!
This is going to be a "short" tutorial/showcase on how I train my Flux loras.
Originally I wanted to make one big tutorial for training and generating but I'm splitting it into two parts and the second one should come "shortly" after.
I know I'm kinda late to the party with it, but perhaps someone would like to use my flow :)
(there is an attachment archive with my settings that I cover in this article)
Since I train on RTX 3090 with 24 GB VRAM I am not using any memory optimizations, if you wish to try my settings but have less VRAM - you could try applying the known options that can bring the memory requirements down but I do not guarantee the quality of the training results in that scenario.
Setup
I am using kohya_ss from the branch sd3-flux.1 -> https://github.com/bmaltais/kohya_ss
If you have a kohya setup already but are training different models and you are not on that branch, I suggest duplicating the environment so that you do not ruin your current one (since the requirements are different and switching back and forth between branches might not be a good idea)
I do have a separate env for 1.5 loras/embeddings training using kohya and I created a separate one for Flux.
Additionally, I am using a snapshot from 18th September (commit: 06c7512b4ef67ae0c07ee2719cea610600412e71)
git checkout 06c7512b4ef67ae0c07ee2719cea610600412e71
If you have problems reproducing the quality of my models, perhaps you should switch to that snapshot, but I suggest starting from the latest one.
In my experience, it is better to be safe than sorry as it is possible that backward compatibility could be broken.
Case in point, my dreambooth trainings which I still use (for LyCORIS extraction) is snapshotted to the version from almost 2 years ago.
I found myself wondering why my training lost quality when I moved to Runpod and as it turns out updating accelerate, transformers and one more library was what did it.
As soon as I went back to the exact version that I used on my local machine, the quality of the training was restored.
I'm not saying that the latest branch won't work, but I can't guarantee that in 2-3 years time (if we still even be training flux) the up-to-date repo will still be training the same way as it is now.
With that out of the way, let's focus on the training script itself.
Training scripts
First and foremost, I do not use GUI at all (the one time I use it is to get the config files and execution paths), for me, it is always straight from the console.
There are two main reasons for this:
* you have all the settings saved and you can just easily replace one or two variables (usually the model name and filepath)
* you can easily set up more than one training (great when you want to train multiple models while you're asleep or at work)
In kohya you can run the training script as a one-liner or you can load the settings from a toml file. I'm using the second way.
Here is my script:
/path-to-kohya_ss
- this is just a path to your kohya_ss with the flux branch
/path-to-setting-file/settings.toml
- this is the path to the toml file that has all the settings
Linux execution script (you could name it train.sh
for example):
/path-to-kohya_ss/venv/bin/accelerate launch --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 /path-to-kohya_ss/sd-scripts/flux_train_network.py --config_file /path-to-setting-file/settings.toml
Windows execution script (you could name it train.bat
for example):
cd /d path-to-kohya_ss
call ./venv/Scripts/activate.bat
accelerate launch --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 path-to-kohya_ss/sd-scripts/flux_train_network.py --config_file path-to-setting-file/settings.toml
where path-to-kohya_ss
should be something like C:/path-to-sd-stuffs/kohya_ss
and path-to-setting-file
should be a path to where your settings toml file is (same convention as path to kohya)
Please note that even though it is Windows, here the paths are Linux-like /
and not windows-like \
:-)
The toml file has been attached to this article post, but I have to explain some values from it:
Those are values that you have to change only once:
at the very top of the .toml file
ae = "/media/fox/data2tb/flux/ae.safetensors"
output_dir = "/media/fox/data2tb/tmp"
pretrained_model_name_or_path = "/media/fox/data2tb/flux/flux1-dev.safetensors"
clip_l = "/media/fox/data2tb/flux/clip_l.safetensors"
t5xxl = "/media/fox/data2tb/flux2/t5xxl_fp16.safetensors"
sample_prompts = "/media/fox/data2tb/tmp/sample/prompt.txt"
ae, pretrained_model_name_or_path, clip_l, t5xxl - those are the paths to the flux models since you're training flux - you should be familiar with them so just point to where you have them
output_dir - this is the output folder of the trained model(s)
sample_prompts - this is a file containing sample prompts, even though I am not using prompts during training I still have to have this .txt file (just put anything there like photo of yourtoken
)
at the bottom of the .toml file
save_every_n_steps = 100
resolution = "1024,1024"
max_train_steps = 400
train_data_dir = "/media/fox/data2tb/flux/training-data/sets9/amyadams/flux/img"
output_name = "flux_amyadams_v1"
The first two you may also configure once and forget them, but perhaps you may want to play with them occasionally.
resolution - I tried with 512,512
and was having okay results, but I have switched to 1024,1024
and I do believe that I'm getting even better results. If you can have enough memory, go for 1024, if you are on the lower side, this might be the place where you need to go lower
max_train_steps - this one is important because (like with my small loras/embeddings) I'm not relying on epochs for training and the non-intuitive computations of steps. I just set a hard cutoff point which in my case is at 400 steps.
save_every_n_steps - we are mostly interested in 300 steps
and 400 steps
snapshots, if you feel like smaller granularity might serve you better, go for 50
In most cases the best training will be with400
steps, however, I have found out that occasionally the best one is actually either 300
or even 500
.
400
works quite well so that is my go-to. I still can't pinpoint what causes the model to be better so to be on the safe side I snapshot every 100 so I have access to 300
in case 400
seems to be overtrained.
If your goal is to have a good enough match, doing 400
steps will be fine. However, if your intention is to have a perfect match, you would most likely need to take a more cautious approach:
* generating even up to 700
steps
* generating more than one model and then using both (or more) together in a combination with different weights (I will explain this in another article that should come out shortly after this one, I decided to split it into a "training" guide and "usage tips/tricks + my observations" article)
output_name
is the name of the output model, without extensions (it will also get suffixed with the number of steps)
train_data_dir
is the folder where you have your dataset images, but there is one thing that you should know:
let's say your training data dir is pointing to a folder img
(`/home/user/flux-data/img` or C:/my-stuff/flux-data/img
)
you need to put your dataset images in a subfolder like: 100_token class
When I train a woman I use 100_sks woman
and when I train a man I use 100_sks man
.
If you want your token to be more than one word, you can do that, for example: 100_billie eilish woman
You can also train for other concepts, like styles: 100_wasylart style
or anything else. Please do remember though that the params were picked for training people, with concepts you may need to train shorter or longer (you just need to test it out)
You are probably wondering why 100_
, well this is kohya's way of indicating how many epochs to use but since we're using steps (maxsteps) this doesn't really matter, we just don't want to finish too early hence 100.
Datasets
I had success training with as little as 15 images and as many as 60. You could most likely go in both directions but I just didn't test the limits.
However, after multiple Flux trainings I am personally leaning towards using around 20 images that best represent a person.
When I was using more than one of the two could happen:
* some of the images though looked nice and had great resolution - but sometimes didn't really capture the essence/likeness of the person and Flux is quite perceptive, so it is better to have fewer images but have them really show the likeness of the person (sometimes different makeup, angle, lighting may show a person in an unnatural look)
* with more images sometimes I was getting models that seemed undercooked at 400 steps, it could be related to the previous point and that it just wasn't able to converge to the concept as well because of the differences in the images
As for the images themselves, we have bucketing enabled in the settings and I am not cutting them as I did for 1.5 / SDXL.
I was cutting them early on but once I stopped - I didn't really observe much difference and one less step is always nice.
I still filter out those that are blurry, with obstructed faces, with multiple people in the photo. I make sure that either the face or the body is visible
(yes, not every image needs to have a face, can be handy if you have for example photos of tattoos and you want Flux to learn those too)
I still maintain the opinion that the dataset is the key element of the whole training. You can of course mess up the training parameters but if you mess up the dataset - you won't be able to train anything good.
The resolution of images doesn't seem to matter all that much, but in general - higher resolution usually means better quality images and you want the best quality in your dataset.
Various
I think I've made around 150 Flux trainings by now. I had twice really weird situations when a trained model was "broken". It was generating noise (even the lower steps snapshots had that issue). There could be some problem on my end (GFX card?) because rerunning it without any changes to the config or dataset helped.
I have no idea what it was, but just saying it in case your first-ever training gets you white noise.
There is also an interesting thing relating to Lora's trigger token. I am training it with my regular sks
token but I sometimes use different tokens too.
I found out by accident that the token doesn't really do much if anything at all while training Flux on kohya_ss (at least with the settings I have).
You can prompt for an sks woman
or woman
or person
and you will still get what you trained. I keep sks
in my prompts but I found it quite interesting.
It could be related to the concept of bleeding in flux training that some people talk about.
Lastly, how long does the training take?
On my 3090 24 GB VRAM, it takes around 2h and 15 minutes but preprocessing also takes some time so in total it is around 2.5h for 400 steps.
That concludes the first part - training. If you have questions, please leave a comment :)
In the next episode
The second part of the guide will focus on the following:
* general prompting using my models
* which will include my thoughts on the base model(s)
* which steps snapshots and what strengths to use
* I will discuss the concept of multi-lora concept (which most of you might be familiar with from 1.5/SDXL), spoilers: it also works really well here and I strongly suggest using it
* and lastly, will cover using additional loras while generating
I would love to give a shoutout to https://civitai.com/user/Sacchan who recently started doing models as well and for the most part, follows my guide :) (but makes much better sample images :P)
Be sure to check the models, I'm confident you won't be disappointed :)
Cheers!
Malcolm