Sign In

My Thoughts and Process on LoRA Training for Flux

14

My Thoughts and Process on LoRA Training for Flux

I just realized today(2025-02-16) that I have trained and uploaded over 250 LoRAs to CivitAI since the end of October. A few people have asked how I train them, so here's the explanation.

Only train what you need for a coherent model and nothing more.

Setup

I use Fluxgym running inside Pinokio(https://pinokio.computer/), which is as close as possible to an AI appliance. It sets up and maintains all the requirements and virtual environments without requiring me to tinker with anything. Thus, I can spend time curating datasets and training the LoRAs, not futzing with the tool.

Dataset Curation

Fluxgym(and Flux in general) makes extending the models with minimal datasets easy. Now, I will locate about 18 good quality(or the best I can find) images that capture what I want the LoRA to represent. Since I specialize in Celebrity LoRAs, there are specific guidelines I follow:

  • Always use images when the subject is 18+

  • If their physical appearance changes over time, create versions for each look

  • If they are known for specific physical attributes, I try to make sure I capture that

    • Dark hair with bangs

    • Big breasts

    • Singing

  • Simpler poses, but not all the same (standing, sitting, walking).

  • A mixture of close-up, portrait, entire body - 80%/10%/10%

  • High resolution, if possible. At least one dimension greater than 1024 pixels.

  • If all you can find are NSFW images, be prepared for some strange outcomes. With more recent celebrities, I generally find it easier to find some SFW images, even if provocative, since they post on social media.

  • Make everything PNG images. Fluxgym likes them better.

I have a background in machine learning and try to apply the same principles for LoRA training.

Only train what you need to make a coherent model and nothing more.

This works well for Flux LoRAs, but SDXL, Pony, and other base models require more images and different selection processes. However, the same principle applies.

Fluxgym Setup

Once I have the datasets/images ready, they go into the training queue. I track them in a spreadsheet or something.

Here is where Fluxgym comes in. I start from Pinokio and load the images once the Gradio screen appears.

I fill in the file's name, copying and pasting it into the trigger word slot. Flux doesn't necessarily need a trigger word, but Fluxgym likes it. I choose flux-dev. There are others, and you can upload your own, but I never tried anything else. I then set memory to 12 GB, repeat trains per image to 5, and epochs to 32. I don't create sample images anymore. I have done this so much that I don't need to see the progress.

  • 5 repeats - limit the bad stuff in each image. Is it grainy or low quality? Are there marks on the image? Fewer repeats keep those artifacts from getting "burnt into the model."

  • 32 epochs - With fewer repeats, the model does not show a likeness of the subject until 7-8 epochs. With more repeats, you can see the subject's likeness in fewer epochs.

  • Everything else is set to defaults. Fluxgym is a simplified Gradio interface to Kohya_SS optimized for Flux. They spent much time figuring out the optimal setting, and I am happy with what they chose.

I arrived at these settings through some trial and error. I used to source more images, like 40, do more training per image, and do fewer epochs. I found that while these worked, finding the images and training the LoRA took longer.

After all this, I move on to Captioning. Fluxgym generates captions using Florence2 and does a generally good job, but I usually check them out before beginning to train.

Captioning The Sample Images

As I said, the auto-caption does a good job but can use some help. It tends to be verbose and creates more detailed descriptions than necessary.

It will often create a story about how the subject is in pain because they were just told they have some disease. Sometimes, it will actually name the subject or misname them as someone more popular(you would be amazed at how many Dolly Partons and Marilyn Monroes show up).

If you added a trigger word, that will show up first. I like to keep it simple: "xxxxx a woman standing and smiling," and then let the training fill in the rest. In my experience, this keeps the model more flexible. If the subject has anything specific to them, I will add it—long, dark hair or large breasts

  • Example:"xxxxx a woman with long, dark hair and bangs is smiling"

Someone brought up the good point that physical or other characteristics when included, can limit the LoRA. That is very true. However, if you really want the characteristics enforced in the model, only include them if you want to make the LoRA affect the generation prompt.

Only train what you need to make a coherent model and nothing more.

Once all that is done, I push the train button and let the model bake. With these parameters, it takes about 5 hours on my system, compared to 8-10 hours with my earlier settings.

Model Done

At the end, I have eight saved epoches and move the last one over to the machine I do my sample images and other tests. But that is for another day. :)

Thanks and I hope you found some of this helpful.

14

Comments