santa hat
deerdeer nosedeer glow
Sign In

SDXL 二次元 Lora:50MB已经是足够的大小 | SDXL 2D Lora: 50MB is already a sufficient size.

6

我注意到将个人观点放置在模型简介下面可能不太合适,因此我决定单独撰写一篇文章来详细介绍我的观点。

论文的下载地址:https://arxiv.org/pdf/2309.14859.pdf

在阅读以下内容之前,请注意以下声明:我并非相关专业人士,以下仅为我个人的主观意见,与他人无关。同时,由于我在转述他人观点时可能存在转述失实或翻译不准确等问题,因此可能会导致读者产生误解。此外,我使用ChatGPT进行翻译,也可能引发理解上的偏差。如果您对内容有任何疑问,请在评论区留言,我会很快看到。

在过去很长的一段时间里,我一直使用32Dim(Dim 为 network_dim)的参数来训练SDXL的二次元角色。虽然效果勉强令人满意,但一个SDXL的二次元角色需要200M的空间,我认为这仍然太大了。想象一下,如果用户下载了5个这样的lora模型,他们将占用1G的空间。尽管确实有一些SDXL的二次元角色模型单个就达到了1G,但我个人认为这样的大小对用户的磁盘空间来说并不友好。

在10月18日的下午,Kohaku-XL的作者琥珀青叶向我们提出了一个观点:一般情况下,训练单个角色时,使用16Dim的网络维度已经算是很高的了,而当训练多个角色时,网络维度不应超过48。

事实上,16dim都够炼50個角色4套衣服了。

他认为,当网络维度超过48后,学到的大部分内容都是噪音。简单来说,就像是连训练都像抽卡,网络维度越高,这种现象就越明显。因为实际有用的信息远远少于lora模型可以存储的信息量,lora开始学习多余的内容,而这些多余的内容会受到种子的影响(例如数据集的顺序、每次训练的时间步等)。

他的观点基于lycoris的论文,他们的团队测试了不同算法的性能,训练了数百个模型,生成了数百万张图片,并对每张图片进行了多个编码和度量的处理。

论文的下载地址:https://arxiv.org/pdf/2309.14859.pdf

后来我询问他,训练SDXL的二次元角色的lora模型,8Dim是否足够。他回答说:“足够了,你也可以尝试一下lokr,将factor 设置为8~12,并使用full rank。”这时,我坚定了使用8Dim的决心(也有可能除了网络维度,其他的我没有听清楚)。他继续补充道:“我建议你尝试一下lokr,将factor设置为8,网络维度设置为100000(网络维度随便给一个很高的数字以触发lokr的完整维度)”。

之后,我相信8Dim是可行的(至于琥珀青叶提到的lokr,我完全忽略了)。以下是一组对比图,我认为8Dim的效果可能比32Dim更好,即使没有超过,差距也不会太大。考虑到体积方面,一个是200M,而另一个是50M,我认为体积更小的8Dim是更好的选择。请注意,这仅仅是我个人的观点。

我使用的模型是Kohaku-XL beta。在测试中,我按照从左到右的顺序分别使用了8Dim的Lora模型、32Dim的Lora模型以及没有Lora模型的情况,这些对比图都采用了相同的生图参数。在训练过程中,我保持了其他训练参数的一致性,唯一的区别在于维度(32Dim的Lora模型使用了16alpha,而8Dim的Lora模型使用了4alpha)。以上是我的测试设置。

模型链接是https://civitai.com/models/167584?modelVersionId=188485

以及:Star Rail_XL 星穹铁道 黑塔 Herta 8dim/32dim - Dim8 | Stable Diffusion LoRA | Civitai

还有另一个模型,我也做了xyz图表对比,其中,heita_32 to 8是将32dim通过supermerger插件将dim值从32降到8:


I noticed that placing personal opinions below the model introduction may not be appropriate. Therefore, I have decided to write a separate article to provide a detailed explanation of my viewpoints.

Before reading the following content, please note the following disclaimer: I am not a relevant professional, and the following is solely my personal subjective opinion and unrelated to others. Additionally, due to potential issues such as inaccuracies in paraphrasing or translation, there may be misunderstandings for readers. Furthermore, I am using ChatGPT for translation, which may introduce biases in understanding. If you have any questions about the content, please leave a comment in the comment section, and I will get back to you promptly.

For a long period of time, I have been using parameters with a network dimension of 32Dim (Dim stands for network_dim) to train the 2D characters of SDXL. Although the results were barely satisfactory, an SDXL 2D character still required 200MB of space, which I considered to be too large. Just imagine, if a user downloads five of such Lora models, they would occupy 1GB of space. While it is true that some individual SDXL 2D character models reach 1GB in size, personally, I don't think such a size is user-friendly in terms of disk space.

On the afternoon of October 18th, Kohaku-XL's author, Kohaku Aoba, presented us with a viewpoint: generally, when training a single character, using a network dimension of 16Dim is already considered high, and when training multiple characters, the network dimension should not exceed 48.In fact, even with a 16-dimensional model, it is enough to train 50 characters with 4 sets of clothing.

He believes that when the network dimension exceeds 48, most of the learned content becomes noise. In simple terms, it's like training becomes like drawing cards, and the higher the network dimension, the more pronounced this phenomenon becomes. This is because the actual useful information is much less than the amount of information that the Lora model can store. Lora starts to learn redundant content, and this redundant content is influenced by factors such as the seed (e.g., the order of the dataset, the time steps during each training).

His viewpoint is based on the research paper by Lycoris, where their team tested the performance of different algorithms, trained hundreds of models, generated millions of images, and applied multiple encoding and measurement techniques to each image.

Download address for the paper:https://arxiv.org/pdf/2309.14859.pdf

Later, I asked him if 8Dim is sufficient for training the 2D characters of SDXL with the Lora model. He replied, "It's enough, and you can also try Lokr. Set the factor to 8-12 and use full rank." At this point, I became determined to use 8Dim (although I might have missed hearing about other aspects besides the network dimension). He further added, "I suggest you try Lokr, set the factor to 8, and set the network dimension to 100,000 (set a high network dimension arbitrarily to trigger Lokr's full dimension)."

Afterward, I believed that 8Dim was feasible (regarding Lokr mentioned by Kohaku Aoba, I completely ignored it). Here is a set of comparison images, and I think the results with 8Dim might be better than with 32Dim, or at least the difference won't be significant. Considering the size aspect, one is 200MB, while the other is 50MB, so I believe the smaller size of 8Dim is a better choice. Please note that this is solely my personal opinion.

The model I used is Kohaku-XL beta. In the tests, I used the Lora model with 8Dim, the Lora model with 32Dim, and also a case without the Lora model, in that order from left to right. All the comparison images were generated using the same base image parameters. During the training process, I maintained consistency in other training parameters, with the only difference being the dimensions (the Lora model with 32Dim used 16alpha, while the Lora model with 8Dim used 4alpha). These were the settings I used for my tests.

The model link is https://civitai.com/models/167584?modelVersionId=188485

AND:Star Rail_XL 星穹铁道 黑塔 Herta 8dim/32dim - Dim8 | Stable Diffusion LoRA | Civitai

There is another model, and I have also made XYZ charts for comparison. Among them, "heita_32 to 8" is a model that reduces the dimensionality from 32 to 8 using the Supermerger plugin.

6