I might translate it into English in the future, but it is not that easy.
If trainning loras is like……watching tutorials in a supermarket to learn certain recipes.
The pretrained model is like the supermarket.
Tutorials are like datasets.
Parameter dim is like the thickness of your notebook.
Parameter alpha is like how much you can see each time you watch the tutorial.
If your dim and alpha are quite large, then you will lean a lot in a rapid way. But sometimes what you want to learn it is really simple, such as "how to boil water", you do not have to prepare such a big notebook, and learn so seriouly right?
If your alpha is relatively low, while the learning rate is still the same, you will learn slowly, which means you need to set the lr greater and use more epoches.
However, you can still use small dim and alpha, which is like writing a micro cheating notebook full of micro texts.
Overfitting lora is like……you've learnt too much, and take down too many unnecessary details. Originally, you want cook tomatoes with fried eggs, but you even want the same apron and pot! So a bad result is doomed.
把训练lora比作在一家菜市场看教程学做菜的话。
底模相当于菜市场。
教程相当于训练集中各种各样的图片。
Dim参数相当于你的笔记本厚度。
Alpha参数相当于你每次抬头看教程的视野范围。
如果你dim和alpha都拉的很高,你就能很快地记下很多东西。但有时候内容非常简单,好比是烧开水,你不需要准备那么厚的笔记本,也不用学那么细致。
如果你的alpha比较低,原本的学习率下,学得就会很慢,此时需要增大学习率或者延长学习时间(也就是更多的epoch)。
不过,你仍然可以使用较低的dim和alpha,小火慢炖,也能练出来,就像是一本精雕的口袋秘籍了。
过拟合就相当于学得过头,学得死板而不知变通,做一盘番茄炒蛋,一定要用某地的番茄某只鸡的蛋穿同款的围裙用同款的锅,结果自然是弄得一团糟。
删掉训练中的某个tag,tag所对应的特征不会消失,而是转移到某个tag,甚至因为找不到对应的tag可以绑定,成为lora的“底色”。打个比方,笔记里是两行字“放炒蛋”“放番茄”,结果变成了一行字“放炒蛋”,但放炒蛋被额外赋予了“并放入红的圆圆的东西”的含义。
这样的结果是,用户使用这个lora自然是方便了,但是“放番茄”这个行为失去了独立性,也失去了对番茄的认知。如果训练底模和使用底模的画风差距大一些,放进去的可能就不是番茄,而是苹果甚至小丑的鼻子了。
底模用动漫底模,结果有人非要在真人模型上使用这个lora,差别有可能不是中美超市的程度,而是人类超市与外星人超市的程度。比如有的外星超市里并没有地球番茄,甚至最像红色圆圆的东西是外星人的答辩,特别有的人还喜欢加doll_likeness之类的添加剂,剂量严重超标,那lora能不变味吗?