For about 2 weeks now Civit has made AI Toolkit training the default. I've spent some time experimenting, dual training the past few loras in order to compare results, it wasnt only until recently where the last lora released was trained only on Ai toolkit.
Getting good results doesnt require much different work, the main 2 things to keep in mind are:
1-Ai toolkit uses Cosine for the scheduler. If you're accustomed to the kohya default using "cosine with restarts" with 3 cycles, you will need to change some parameters to get similar results. (tldr: cosine means that the lora has an early spike in learning that lowers over time/epochs. Cosine with restarts 3 cycles means the learning decay restarts and you will have 3 spikes of learning spread across however many epochs you have)
What this means is that to get similar results as before you will want to repeat the dataset more or add more epochs, repeating the dataset will make the high learning epochs have more impact, more epochs will make the learning decay go down slower.
An example would be, if before you trained a character lora, 800 steps 10 epochs; you can get a similar result by doing 1600 steps 15 epochs.
2- Network alpha is set to "32" by default instead of "16". tldr: this is a value that affects how the lora learns new concepts/overwrites checkpoint concepts. The recommended settings is normally half of the Network Dim size.
Network dim affects the lora size, I've played around with dim before, to get similar results you need even more steps and epochs. It can work for some, I'm just comfortable with the 32 as default.
Set network alpha to 16 (half of whatever network Dim size you are using)
Overall thoughts on Ai Toolkit.
While having more options is always good, as of now bothering to use AI toolkit is more trouble than its worth. While in theory is supposed to be a lot faster. Training is constantly getting stuck, and outright failing. A kohya 800 steps 10 epoch takes around 40 minutes; wile a 2000 steps 15 epoch Ai toolkit should be even faster the training is most likely going to get stop, show no update then update 2 hours later with all 15 epochs done or showing a failed to train message.
On another note seems they changed some settings on the trainer and people are complaining about constant pauses, its happened to me twice one of them on a 100% sfw dataset. Both times training was approved, but is still annoying and the response time for pauses took around 15 hours both times. Imagine paying to be inconvenienced after you were already inconvenienced by having to go trough the crypto/giftcard hoops.
For now I dont plan any changes to my postings. I normally dont have problems with the trainer but follow me on other sites if you care. With the constant changes who knows what can happen, how long is the site going to be able to continue or if people will be forced to move.
