Update 1: 25/08/24 included resolution comparison, and trigger word check
It's been a bit over a week and I have done quite a fair bit of lora training for flux so I thought I would jot down my observations so far. There are far more knowledgeable people when it comes to lora training but since there are not a lot of firm answers right now I thought I would jot down my thoughts and share my two cents.
I have created 18 loras so far, some unreleased yet (because I hate comparing epochs and created a backlog for myself) and some revisited versions which should be released soon.
Resolution:
I disagree with civitai defaulting to 512 on the onsite trainer.
Feels like it has quite a big impact. For me it seems 1024x1024 gives crisper images than training on 512x512. On top of that I am starting to suspect it also affects the quality of text within the image.
Test results: The first two images are of epoch 15 when trained on 512, the rest are from training on 1024.
Bucketing- Until now I mostly trained on 1024x1024 sized images, got two loras baking that use bucketing and I hope to update this with something more interesting but so far it seems to be working well though maybe not as good as 1024x1024.
Tagging:
Big update! using existing concepts will push the lora towards the flux idea and would take more steps to train. For example "samus aran" and "zero suit" are known concepts
This is trained with the existing tokens at epoch 11
Same data and config with new tokens at epoch 11
Natural language seems to work the best, I use Caption Helper (sd-caption-helper.vercel.app) to help with the captioning and then edit the result.
Multiple concepts in the same lora:
My attempts with this mostly dealt with training a character and their known clothes in a way that would let you dress them in other items of clothes. This is defiantly possible repeating the description of the clothing in a very similar way throughout the tagging seems to do the trick. When trying to do more than one item of clothing though I got more mixed results for the second description but this might be because flux likes much lower number of images to train on.
Trigger word:
People- For celebs it's best to use a new trigger word and not their name. My theory is that their names we intentionally poisoned so the model wont reduce their likeness and it is also affecting the lora.
Clothes- I experimented with a few methods from using known tokens (like bikini here) to new trigger words (like here). From what I see, for things that resemble known concepts enough using existing tokens help to speed up the training and can let you use lower number of steps to train, for more foreign concepts creating a new trigger gave better results but you needed higher step count.
Training:
Smarter people will have better info here but 15-30 pics seems to be the sweet spot for number of images.
Clothes seem to work best at around 500 steps as they require quite a bit of flexibility while for people it feels like somewhere between 1000-2000.
Final thoughts:
I am honestly humbled by the response I have gotten this last week, seeing people use my loras and share their results makes all the work worth it
PS:
I hate comparing epochs.
If someone can create me a comfyui workflow to compare epochs I will be eternally grateful (plus I am willing to put 1k buzz on the line for it). Turns out the best method was not to use Comfyui for this but install forge and you the good old reliable prompt s/r.
Special thanks to punzel who made this kickass guide.