My LoRA training breakthrough.

I just had a bit of a breakthrough with my own LoRA training and thought I would share. I was getting some OK to poor results up until this point. I was using a series of 2500 regulation images generated by my source model, so I probably do not have the ratio correct. After removing the regulation images for now, the outcome is quite good.

I like using the runwayml/stable-diffusion-v1-5 model as my source because it seems to be very flexible using a diverse model set to work from once you are done. So far I have tested this with 'Photon V1' and 'Reliberate V10'. Wonderful results.

I will try to list my settings and technique best I can, if I don't mention something I didn't change it from stock settings in Koya_ss . If you don't know how to install Koya_ss there are many good tutorials for that here and on YouTube.

My two datasets were 225 images and 79 images. The only difference is the 225 set I had 10 repeats, and 79 I had 20 repeats. I found I was not having enough repeats trying to keep the process under 3500 steps. More repeats really got things moving in the right direction.

I am training on a RTX3080 and I have 32GB of system memory.

I ditched buckets as I don't mind prepping the images very closely. I carefully pull images that resemble what I want the outcome to look like, I also try to select 'strange' angles if possible to give context. That being said those angles need to support the overall look of the final product. Once I have my images I use a program called 'Inbac' to crop to 512,512. I know buckets exists, but I am a control freak and want to make sure the parts of the image I want get looked at, and I also think it helps it to run smoother. Inbac is great because it was built for speed and you can set your aspect and resolution to get what you want FAST. I can run through 225 images in minutes.

https://github.com/weclaw1/inbac

Then I use the utilities to caption using WD14 and just let it rip. I don't try to exclude anything.

I use BooruDatasetTagManager to edit the tags. I cut out everything I don't want to change, and leave everything else in. I don't actually edit the tags on every image, just the overall tags. I especially cut out the eye color, hair color, etc. because it does get it wrong a lot and it just confuses things. Also some weird tags get in there like 'horror(theme)' which I definitely want to get rid of.

https://github.com/starik222/BooruDatasetTagManager

After that it's training time!

In Dreambooth LorRA I set my folders. I create a directory for my LoRA, within I make an images, log, output folder, plus one for my source images (non-cropped) I copy my dataset images in into the original images directory and crop. Then I rename the cropped folder in the proper syntax. Make sure my 'class' tag is in the images directory. (man, woman, etc.) Again, I used a repeat of 10 for a 225 image set, and 20 for a 79 image set. Your final directory holding your images should look something like:

#ofRepeats_NameOfLora class

20_BusinessMan man (for example)

OK. Settings. Here we go. I will list the things I changed.

Train batch size: 4 -- I am using a 3080 so I have enough VRAM. If you train at 1 I would reduce the learning rate or decrease the repeats. However I believe more epochs is better so I suggest keeping those high. You can always use an earlier epoch if need be.

Mixed precision: bf16 -- again using a RTX 3080, if you are using something 20xx or lower use fp16

Epoch: 20 -- I don't use 20 Epochs, but I want to make sure I have enough. You really do need to see which one is the sweet spot at the end.

Save every N epochs: 1 -- You want to have choice at the end.

Learning rate / Unet learning rate: .0001 (I use that for both)

LR Scheduler: cosine with restarts

Network Rank: 64

Network Alpha: 32 --try lower if you want, this worked for me

Turn buckets off if you have 512,512 images

Max Resolution: 512,512

Then in Advanced -- LR number of cycles: 12

That's it. I don't have it generate and image at every Epoch because it is using the base model and it always looks crazy for me. I will test the Epochs on another computer as they complete if I am in a rush for some reason, but mostly I just wait for it to be done then start trying. I usually start with Epoch 10 and if there are artifacts work back, and if it isn't cooked enough I work forward.

Edit 7-19

After more work, I might be advocating for too many repeats. Less repeats and more Epochs might be better. Other than that I am still getting great results with the other settings. If you want to give this a try, set your repeats to 5 and Epochs to 50 and see where you land. Using these settings I am able to pull a full LoRA in 15-20 minutes with a 3080.

My LoRA training breakthrough.

Comments