On February 14th, NVIDIA published a paper titled "DoRA: Weight-Decomposed Low-Rank Adaptation", proposing a LoRA-like technology named DoRA. By paying attention to both the magnitude and direction changes during weight updates, DoRA achieves an effect closer to fine-tune than LoRA.
Since there is no source code available, it is uncertain when the open-source community will be able to apply this new technology to models like SD. However, in addition to DoRA itself, they also analyzed the differences between LoRA and fine-tune, which I find quite interesting.
LoRA can be considered a low-rank approximation to fine-tune, where increasing the rank allows LoRA to achieve a fine-tuning effect similar to fine-tune. Previously, most research attributed the difference in fine-tuning accuracy between LoRA and fine-tune to the difference in the number of optimization parameters they use.
However, this study paid special attention to the magnitude and direction of weight updates during the training process of fine-tune, LoRA, and their proposed DoRA. They discovered that there is a significant positive correlation between the size and direction changes of weight updates in LoRA at each step of the training process, as shown in the following figure:
This is quite different from finetuning and DoRA. In other words, when LoRA updates weights, it either makes large-magnitude + large-direction changes or small-magnitude + small-direction changes, therefore lacking the ability to make more nuanced and diverse adjustments, such as implementing large-scale magnitude updates with minimal changes in direction. Although this may not be practically useful to everyone, it can help deepen the understanding of the differences between LoRA and fine-tune, so I've decided to record this information.
Of course, this paper seems quite insignificant and helpless in the face of such a game-changing release by Sora. 🥲ology