Experimental Concept: Ckpt as a Lora. Can Ckpt be compressed by up to 30%? Part 3

Part1 (This is Part3 section.)

At the end of November, when I was mainly working, Hassaku was in the Top 4, but in December Counterfeit started to rise. I was wondering if I should redo everything, but I decided to create a new interim reference model by leaving the previous parts as they were and redoing the parts where the images were not aligned. The model used was the same name as the first one, RM15, so the PNG information may be confusing. This model is published as RM01. The name RM01 will be used below in the article.

Let's look at it in a little more detail. KawaiiAnimetic is characterized by its painting style. There are many anime models in the world, but most of them only deal with anime-style characters and do not draw anime-style paintings. Many are illustrative. KawaiiAnimetic's value lies in the fact that it reproduces the coloring of the cell drawings to a great extent.

You can clearly see the difference in color by drawing the skin, so let's change the prompt to a swimsuit.

The dimension 256 seems to be very effective. However, with one Lora, although the character itself is quite similar , the painting is not reproduced. Even if you use 2 Loras, there are still some areas where it is insufficient, but this will pass the test.

It would be ideal if we could operate with just one Lora, so we checked to see if we could reproduce the coating by changing the applied strength of Lora.

It looks like it can be roughly reproduced with an intensity of about 1.2.

Here's what it means:

Even if you change it to Lora, you can pretty much capture the general characteristics.
Detailed features are difficult to reproduce by single Lora.
Differences in default values assumed by the model can be adjusted
Detailed features can be reproduced to some extent by adjusting Lora's intensity.

In addition, there is also a model R-Fantasy that was created with the idea that the reproducibility would be higher if the ckpt was created based on a reference model from the beginning, but the conclusion is that I don't think it will make much of a difference. I think this is because the internal configuration has changed considerably due to block merging and elemental merging.

Of course, I have tried other models besides KawaiiAnimetic, but since CIVITAI does not allow sufficient use of images, I have narrowed it down to the one type that is easiest to understand. As you can probably guess, converting from photography to anime drawings is a slightly difficult case. Naturally, I am able to reproduce most of the results with my other models. Models other than mine, I have failed to reproduce Midjourney Papercut. If you do it with dimension 256, the results may be slightly different. Models where details are important are difficult to reproduce.

Conclusion

Let's summarize the advantages and disadvantages of operating the base model as Lora.

+ Reduced storage capacity required

+ Ordinary Lora's accuracy may improve

+ You can do the same thing as normal merging of models on the prompt.

- The number of files increases

- Image quality may decrease

- Burnings are more likely to occur

As mentioned above, we have found that several Loras can produce output that is "roughly" the same as Ckpt (excluding some models).

In the case of dimension 128, it is about 576MB even if you use four Loras, so you can create a similar picture with 29% of the file size. Furthermore, if you operate with one 288MB Lora, the capacity will be less than 15%. I haven't tried it with SDXL, but if you can roughly reproduce it with four 436MB Loras, you can save about 4.7GB per model.

Although it would be ideal if we could completely reproduce the original model, we know that this is impossible due to the amount of information, so this method does not aim for that. However, we believe that if we can reproduce 80% to 90% of the time, it will be sufficiently practical. Currently, it requires very tedious manual work, and if you try to set a dimension of 512 or more, an error will occur in SuperMerger, so you can only use a maximum of 256, and issues remain.

Still, this method seems to open up a variety of possibilities with just a little bit of understanding. If you try hard, you might be able to do something like block merge on the prompt.

Naturally, an approach that does not take the form of Lora and compresses the amount of data by completely extracting the differences from the reference model at a more fundamental level would be possible. However, at present, I believe that this Lora conversion method is meaningful in that anyone can implement it quickly (albeit tedious) and can produce pictures of a practical level.

Although this method does not have to be used, if some kind of common infrastructure can be established, the image generation ecosystem will become more widespread.

(I'm using google translate in most of part. I'm sorry if you found poor English.)

Part1

Part2

New:

I have released the Lora version of the ckpt I created using this method.

Two files are created each with dimension 256.