在过去很长的一段时间里，我一直使用32Dim（Dim 为 network_dim）的参数来训练SDXL的二次元角色。虽然效果勉强令人满意，但一个SDXL的二次元角色需要200M的空间，我认为这仍然太大了。想象一下，如果用户下载了5个这样的lora模型，他们将占用1G的空间。尽管确实有一些SDXL的二次元角色模型单个就达到了1G，但我个人认为这样的大小对用户的磁盘空间来说并不友好。
后来我询问他，训练SDXL的二次元角色的lora模型，8Dim是否足够。他回答说：“足够了，你也可以尝试一下lokr，将factor 设置为8~12，并使用full rank。”这时，我坚定了使用8Dim的决心（也有可能除了网络维度，其他的我没有听清楚）。他继续补充道：“我建议你尝试一下lokr，将factor设置为8，网络维度设置为100000（网络维度随便给一个很高的数字以触发lokr的完整维度）”。
还有另一个模型，我也做了xyz图表对比,其中，heita_32 to 8是将32dim通过supermerger插件将dim值从32降到8：
I noticed that placing personal opinions below the model introduction may not be appropriate. Therefore, I have decided to write a separate article to provide a detailed explanation of my viewpoints.
Before reading the following content, please note the following disclaimer: I am not a relevant professional, and the following is solely my personal subjective opinion and unrelated to others. Additionally, due to potential issues such as inaccuracies in paraphrasing or translation, there may be misunderstandings for readers. Furthermore, I am using ChatGPT for translation, which may introduce biases in understanding. If you have any questions about the content, please leave a comment in the comment section, and I will get back to you promptly.
For a long period of time, I have been using parameters with a network dimension of 32Dim (Dim stands for network_dim) to train the 2D characters of SDXL. Although the results were barely satisfactory, an SDXL 2D character still required 200MB of space, which I considered to be too large. Just imagine, if a user downloads five of such Lora models, they would occupy 1GB of space. While it is true that some individual SDXL 2D character models reach 1GB in size, personally, I don't think such a size is user-friendly in terms of disk space.
On the afternoon of October 18th, Kohaku-XL's author, Kohaku Aoba, presented us with a viewpoint: generally, when training a single character, using a network dimension of 16Dim is already considered high, and when training multiple characters, the network dimension should not exceed 48.In fact, even with a 16-dimensional model, it is enough to train 50 characters with 4 sets of clothing.
He believes that when the network dimension exceeds 48, most of the learned content becomes noise. In simple terms, it's like training becomes like drawing cards, and the higher the network dimension, the more pronounced this phenomenon becomes. This is because the actual useful information is much less than the amount of information that the Lora model can store. Lora starts to learn redundant content, and this redundant content is influenced by factors such as the seed (e.g., the order of the dataset, the time steps during each training).
His viewpoint is based on the research paper by Lycoris, where their team tested the performance of different algorithms, trained hundreds of models, generated millions of images, and applied multiple encoding and measurement techniques to each image.
Download address for the paper：https://arxiv.org/pdf/2309.14859.pdf
Later, I asked him if 8Dim is sufficient for training the 2D characters of SDXL with the Lora model. He replied, "It's enough, and you can also try Lokr. Set the factor to 8-12 and use full rank." At this point, I became determined to use 8Dim (although I might have missed hearing about other aspects besides the network dimension). He further added, "I suggest you try Lokr, set the factor to 8, and set the network dimension to 100,000 (set a high network dimension arbitrarily to trigger Lokr's full dimension)."
Afterward, I believed that 8Dim was feasible (regarding Lokr mentioned by Kohaku Aoba, I completely ignored it). Here is a set of comparison images, and I think the results with 8Dim might be better than with 32Dim, or at least the difference won't be significant. Considering the size aspect, one is 200MB, while the other is 50MB, so I believe the smaller size of 8Dim is a better choice. Please note that this is solely my personal opinion.
The model I used is Kohaku-XL beta. In the tests, I used the Lora model with 8Dim, the Lora model with 32Dim, and also a case without the Lora model, in that order from left to right. All the comparison images were generated using the same base image parameters. During the training process, I maintained consistency in other training parameters, with the only difference being the dimensions (the Lora model with 32Dim used 16alpha, while the Lora model with 8Dim used 4alpha). These were the settings I used for my tests.
The model link is https://civitai.com/models/167584?modelVersionId=188485
There is another model, and I have also made XYZ charts for comparison. Among them, "heita_32 to 8" is a model that reduces the dimensionality from 32 to 8 using the Supermerger plugin.