LoRA training using Illustrious report by a beginner

This article is an automatic translation of this Japanese article.

Introduction

　Illustrious XL, although somewhat difficult to use, has the potential to be a third tier after animagine and Pony, because it uses the familiar Danbooru language, has less compositional errors, yet can reproduce many characters, Art Styles, compositions, etc. without additional learning, and has lower VRAM requirements than Flux. It has the potential to become the third most popular model after animagine and Pony. Personally, I feel that it is superior to Pony in overall performance.

　However, even Illustrious cannot output what it does not know, and there will inevitably be situations where LoRA learning will be necessary.

　In this article, I would like to write my personal opinion about the responsiveness of LoRA learning in the Illustrious series. Please note that this article is intended for readers who have done general LoRA study.

Art Style

　Since Illustrious learned on danbooru, she has memorized many of the Art Styles of the artists who have contributed to danbooru. Therefore, it is better to check if it is already learned Art Style you want to resemble, so that you do not have to waste your time. Note that I wasted several hours trying to create the Art Style of my favorite doujinshi and eroge illustrators: ....... (Of course, these are prototypes for masturbation, so I have no plans to publish them. They were good enough to use for masturbation (huh?) )

　Learning Art Style LoRA can be done in a simple way. In other words, you just need to collect all the Art Style images that you want to make similar to each other, add the trigger word at the top of the list, and let the LoRA learn the Art Style.

　Since I have created LoRAs of “Kantai collection,” I will test the Art Style using “Kantai collection 1944 - Itsuka Ano Umi de.” as a test of the Art Style.

　I use Obsession (Illustrious-XL) as the pretrained model (Obsession has a relatively stable Art Style and high quality among illustrious derivative models). 24 arbitrary SS are used for training images. Other settings are Batch4, 40epoch, cosine as scheduler, Progidy as optimizer, Learning Rate / TELR / UnetLR 1, dimension 8, Network alpha 8, and LoRA output. (Personally, I don't think it's a good idea. (Personally, I think it is better to set dim and alpha to 16, especially for Art Style LoRA.)

Screenshot of training images (for reference)

Output using LoRA

The output prompt using LoRA is only “<LoRA:1> trigger word, character name, serafuku (nontraditional miko, headgear only for Yamashiro), background location”. You can see that Illustrious remembers the character names and responds well to learning Art Style.

Unlearned clothes and characters

　Since illustrious is learned by danbooru, if you put in a character name that is a tag in danbooru, it will reproduce the face (especially hair color, hairstyle, and ornaments around the face) and body shape (apparently, it often understands the bust size) and output them. However, if the character has not contributed much to danbooru, the possibility of unlearning increases, and if so, we will output an imaginary character.

　The same applies to clothing: the more eccentric the fashion, which is difficult to explain in a prompt, the less correctly the illustrious will be output.

　In this case, we tried to see if we could recreate the plainclothes heavy-duty cruiser shown in the ship x Mitsukoshi collaboration in 2017. (The tag in danbooru is heavy_cruiser_princess, but illustrious does not output. Also, only 6 registrations on danbooru when it comes to heavy_cruiser_princess in her Mitsukoshi plainclothes.)

The official heavy cruiser_princess was analyzed by i2i and its prompt was t2i, and as a result,

something totally different was output. Naturally, this means that they don't know the outfit from the Mitsukoshi collaboration.

　So, I created a temporary LoRA with 6 pictures of danbooru and forcefully watered down the number of teacher images to about 40 with different poses,

　After tagging these and running them through the settings in the “Art Style” section, it was possible to output something recognizable as the heavy cruiser_princess from the Mitsukoshi collaboration, albeit at a low rate of occurrence.

　I hope you now understand that Illustrious learns even unlearned characters.

Separation of multiple concepts learned in a single tag

　Lawson, a convenience store, has collaborated with many anime and video games, and many fan art featuring these characters wearing Lawson uniforms has been posted. Therefore, in Illustrious, you can generate characters wearing Lawson uniforms simply by adding the tag “lawson".

　However, Lawson changed its uniforms in 2016, because of this, Illustrious learns both the old and new uniforms in the “lawson” tag.

**The old uniform (5th generation) and the new uniform (6th generation).

Therefore, there is a certain probability of outputting uniforms that are either indistinguishable between the old and new uniforms, or a mixture of both. (In the picture shown earlier, the 6th generation design has the 5th generation red line even though it is the 6th generation design.)

　I will create LoRAs for each of the 5th and 6th generation uniforms so that they can be output with high probability. Referring to the method here, we will roughly paint the faces black and tag them for training.

I collected about 35 teacher images for each of them, but it may not be necessary to collect that many.
Here is the output result. It was able to separate the 5th and 6th generation.

　It may not be difficult because we extracted a part of what they remember from the original.

Characters with multiple outfits

　How many official costumes does the character you want to output have?

　If the character has only one costume, or if you just want him/her to take it off and do hentai acts, you won't have that much trouble.

　But there will be times when the character will wear different outfits in the course of the work. There is no way that Illustrious will remember each and every one of them. Also, in most cases, the pattern of such costumes is distinctive, so it is unlikely that Prompt alone can reproduce the costume. Therefore, we need to work on creating a character LoRA with multiple costumes learned.

　You can find more information on how to create a character LoRA with multiple costumes here and here.

　In this case, I would like to make a Fuso from Kantai collection. Fuso is a character that Illustrious is learning, so she does not need to learn a face and a uniform like a typical miko. However, Fuso has no less than four official outfits alone: her Kai-II uniform with cherry blossom pattern, her Sunday best outfit for New Year's, her kimono outfit for the rainy season, and her swimsuit. (I omitted the Zuigumo Festival and Sanma. ......)

Since the Sunday best outfit and the rainy season kimono have only a small number of fan art, I created a temporary LoRA and increased the number of teacher images to about 35 before creating the LoRA. Here is the result.

　The hand is melted and it is not at a level where it can be published without more filling, but I was able to output a costume that looks like it.

　The most important point I would like to make here is that Illustrious, unlike Pony, can combine multiple types of kimonos into one LoRA. Pony had a problem that patterns tended to get mixed up very easily when multiple types of kimonos were mounted. Illustrious has great potential in that this drawback can be solved.

　This is my report on the Illustrious LoRA study. The current release is a v0.1 prototype, but we would like to see the finished product released as well, if possible.