Full Checkpoint Training in Kohya
9/29/2024 - The latest updates to Kohya_SS SD3-FLUX1 fork breaks this config as it is. The setting you have to change is move the double block swap number over to the far left "blocks to swap". If you don't, you will get OOM errors. It took me awhile to figure out why my saved config would not resume after updating - this appears to be the solution. Testing the new config with xformers enabled and the updated pytorch.
With FLUX training being in it's infancy, I wanted to provide some information to those just getting started. It may save you some hours of trial and error. This is not a tutorial and definitely not for beginners to training.
The attached config file is for the Kohya_SS GUI, but you must be using the SD3-FLUX branch of Kohya for it to work. The config is for training a CHECKPOINT (not a LoRA) in the dreambooth training tab. Using this config, I was able to successfully train the FLUX DEV UNET on a character likeness with great results.
For me, this has two applications:
1.) I then extracted the LoRA from the checkpoint. Per previous articles on here, I concur that this results in a superior quality LoRA. Of course this comes at a time cost. For simple LoRA's, AI-Toolkit or simple LoRA training would still be my preferred method. But I did find extracting the LoRA from a trained checkpoint to provide better quality. Plus, I have the full trained checkpoint for that character. I cannot share the comparisons because the subject of the LoRA does not want their likeness shared publicly.
2.) This is a jumping off place for me to experiment with FLUX checkpoint training to begin seeing what is possible with larger data sets or styles (and not just one person).
Some notes:
I am not claiming this is optimized. Just that it worked and worked well to my liking.
I am sharing because people seem to horde these configs or put them behind paywalls. Personally, I want to help anyone interested in contributing get started.
This ran for me on an RTX4090. It stayed consistently around 22.5GB of VRAM so I dont think this config will run on a lesser card. I know there are ways to do that, but I stopped minimizing once I got it running on my specs.
My data set was roughly 78 images. I did not use regularization images. Those have shown to help on LoRA training though, so I may test them here in the future.
The optimal amount of steps will probably vary based on the quality and quantity of your source data. (This was not an optimal dataset for me, but it still converged very quickly IMO).
It took about 6sec/it so it is much slower than LoRA training only.
I put tips on what to link where when you load the config - so you know where to put the paths to your specific files.
It took me three trial runs to get it dialed in to a working result. This may save somebody else those headaches. I plan to try a much larger data set at a slower learn rate next. However, it will probably wait until I am on vacation so that it does not tie my PC up while I am at home.
Finally, I am just learning by trial and error. I am not a resource for issues with Koyha or training in general. I would point you to their very active github discussions for technical questions. I am just hoping to provide someone a headstart that I wish I had been able to easily obtain.