Last Update: September 20th, 2024 added instructions on how to queue training
Intro
ComfyUI is an incredible tool for AI image generation. Now that FLUX came out and we discovered that LoRA's are trainable someone had the great idea to integrate Kohya_ss LoRA trainer (one of the most used if not THE most used LoRA trainer for Stable DIffusion) in ComfyUI by creating the nodes to load the dataset, the settings and the training modules based on Kohya_ss.
On Github Kijai (Jukka Seppänen) published his ComfyUI Flux Trainer nodes, based on the famous Kohya_ss scripts. And believe me, training on ComfyUI with these nodes is even easier than using Kohya trainer.
So I decided to make a ComfyUI workflow to train my LoRA's, and here it is a short guide to it.
The workflow is composed by 4 blocks: 1) Dataset; 2) Flux model loader and training settings; 3) Training progress validate; 4) End of training. There are also a few orange nodes with notes and explanation about the workflow.
Let's see how each block works.
1) The Dataset
The first step for LoRA training is to prepare the dataset, the images set that will be used to train the LoRA. The two most important things you have to choose are the number of images and their size. Usually FLUX can be trained with just 10-20 images, but it also depends on what kind of LoRA you want to train (character LoRA's are the easiest and do not need many images to give good results, but for style LoRA you should use 20-30 images, do not go too far anyway, FLUX doesn't like large Datasets). Size is also important, you could start by using small images (512x512) and the results will be excellent anyway.
Captioning is not necessary, but it may help a lot the training process. If you want to use captions you need to to use .txt files named the same way the image is. So if you have 01.png, 02.png, 03.png... images, you will have to set your captions for each image in the 01.txt, 02,txt, 03.txt.. files. Use FLUX style caption, it means descriptive and verbose, you will get better results.
In the Dataset group you will have to set the path to the images set (default is "../training/input/" directory), and the "LoraTrigger" word. Switch on the size dataset that you want to use (choose among 512x512, 768x768x or 1024x1024) and you are done here. Remember that you don't need to use square images only, it's suggested to use different ratio images (1:1 but also landscape and portrait).
"Batch_size" (default is 1): a batch is the number of images that the "trainer" will read at once. So a Batch size of 2 will train two images simutaneously. To train a batch of 2 or more images you need more Vram, but training times will be reduced. It's better to not exceed 4, even if you have a lot of Vram, unless you have a nVidia H100 GPU.
If you have a good GPU and the training times are short, you may be tempted to queue more trainings one after the other. I never tested (my GPU is not that good) but you should set the "reset_on_queue" (in the "TrainDatasetGeneralConfig" node) value to "true" so that the workflow gets refreshed for a new cleaner training.
2) FLUX model loader and Training settings
Here comes the magic... and the hard part. First of all you need to select the FLUX model files (blue node). For training with FLUX Dev it's better to use a "lighter", fp8, version of the Unet, flux1-dev-fp8.safetensors that you can find here: https://huggingface.co/Kijai/flux-fp8/tree/main , and save it in ../ComfyUI_training/ComfyUI/models/unet/ folder.
Another important point Kohya underlined is that you should check your Pytorch version, as it is strongly suggested to have Pytorch 2.4+ at least, older version will make the training very slow and time consuming.
The other FLUX model files are the same you have if you are using the original FLUX model that came out on August 1st. Anyway, these are the files:
t5xxl_fp8_e4m3fn.safetensors: https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main save it in ../ComfyUI_training/ComfyUI/models/clip/ folder;
clip_l.safetensors: https://huggingface.co/comfyanonymous/flux_text_encoders/tree/main save it in ../ComfyUI_training/ComfyUI/models/clip/ folder;
ae.safetensors: https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main save it in ../ComfyUI_training/ComfyUI/models/vae/ folder.
Next step is to choose the Optimizer and the training settings. Here is a link to what Khoya wrote about FLUX LoRA training: https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#flux1-lora-training . This is a very detailed article about Lora training for FLUX, you will find a lot of info here. Another very useful article is in this link: https://github.com/bmaltais/kohya_ss/wiki/LoRA-training-parameters and it's from here that I took the following informations.
The Optimizer is a method to update the neural net weights during training.
You can select two nodes here with the Optimizer switch: Adafactor (suggested by Kohya) or the other node that will allow you to choose among Adamw8bit (another optimizer suggest for FLUX), Adamw, Prodigy and Came Optimizers. AdamW (which is 32bit) or AdamW8bit are the second choice for FLUX training. AdamW8bit uses less Vram and as enough accuracy. Adafactor on the other hand adjusts the learning rate appropriately according to the progress of the progress of learning while incorporating AdamW's method.
In the Training settings it is important to turn on "split_mode" if you have less than 16GB Vram. Remember to set your LoRA's name (Output name) and the output directory where the LoRA's will be saved.
"Network_dim" and "Network_alpha" are what usually is called Rank in other trainer. The lower the rank is, the longer it will take to train the model, the smaller the LoRA will be and the harder it will be to "overcook" the LoRA. You can try with values of 16, 32, 64 or even 128 for both and see what work better for you. Remember to keep both at the same value.
"Learning_rate": this is a little technical, as this setting is used to change the weight of the wiring in the neural netword, so that a picture that looks exactly like the given training picture can be made. If you tune too much only to the given picture, you will not be able to draw other pictures. To avoid this, the weights are changed each time by "a little amount" to incorporate a little bit more of the given picture. For FLUX can be a value around 1e-4, that is 0,0001, some people use 4e-3, that is 0,004. Default in the workflow is 0,0004 (or 4e-4).
"Gradient_dtype" and "Save_dtype", usually referred as Mixed-precision and Save-precision, specifiy the weight data during training and saving of the LoRA. They are set to bf16 (a data format devised to handle the same numerical width as 32-bit data) by default, but you can try fp16 (for less Vram use and faster training) or even fp32 if you have a good GPU with a lot of Vram. But with bf16 you will have the best from both fp16 and fp32, less Vram use and a good training/saving quality.
"Cache_latents" and "Cache_text_encoder_output": training images during training are compressed to a state (magic dimension) called Latent and are trained in VRAM in this state. You can set where to keep these images during training: to disk, as temporary files, or to memory for faster training.
In the "Init Flux LoRA Training" node, at the bottom, remember to write one or more prompts for LoRA validation during the training. You can write more prompts separating them by "|".
"Text_encoder_lr" set the learning rate for the text encoder. It can be skipped for FLUX training, so it is set to 0.00 by default.
One last thing, remember that the Steps you set (max_train_steps) must be divided by 4 and in each Lora Training group (4 of them) you need to set the Flux Train Loop with the right number of Steps for that training loop, that is 1/4 of the total Steps you set in "Init Flux LoRA Training".
3) Training and Validate
In the workflow you will find 4 training groups like the one up here. You will be able to check how the LoRA is getting trained with 1 or more images (the example shows 4 different ones) generated with the prompt (or prompts) you set in the "Init Flux LoRA Training" node. Once the training starts you will see these 4 groups progress and show the resulting LoRA applied to the images you prompted for.
4) End of training
Well, after a few hours you will complete your training and you will see an output sheet with all 4 the training results and their images, in a single grid, for easy comparison. Just choose the best results and you have your LoRA for FLUX! Go in the Output folder and you will find the various LoRA's generate during training, saved as .safetensors files. Chose the one you like the best and move it to the /models/loras/ folder to use it in your workflow. Have fun!
5) Resume training
To resume training the first thing to do, before starting, is to set in each of the 4 training groups the "save_state" to "true".
This way the workflow will save the partial training and it will be possible to resume from there.
To resume you have to add the node "Flux train resume" and input the folder where the "save_state" was saved:
The "Flux Train Resume" node must be connected to the "Init Flux LoRA Training" node in the "resume_args" connection dot.
6) FLUX LoRA's training guides
As I am get many questions about "how to train" specific kind of LoRA's, and since I am learning too how to make a LoRA (I just created the workflow based on Kohya trainer), I would like to share some links to articles written by creator who have more experience than me in training LoRA's: