Workflow for creating high quality images with our LoRAs

Hi guys. We are The Holy See and we are going to show you our workflow to achieve all the pictures with the quality shown at our Pixiv.

This is a long post, so, try to read it carefully and check all if all he items are correct before generating.

Why Stable Diffusion 2.1+?

After many tests we made before even starting our Pixiv, we started doing some LoRAs in Stable Diffusion 1.5 anime checkpoints like Anything, Aurora ONE, etc... However, we got the best results and detail retention with Stable Diffusion 2.1 models.

We know in CivitAI there is lack of 2.1 models, so, it would be hard do mix many LoRA models to get the results you are used to, but, we just want to show you our workflow and helping to replicate what we do if you are interested in doing it.

Requirements

We will be using the latest AUTOMATIC1111 version at the moment of writing this (v1.3.2).
A Stable Diffusion 2.1+ model. In our case we are using Waifu Diffusion 1.4, which is the model where all our LoRAs are trained. (Note that 1.4 is the version of the Waifu Diffusion Checkpoint. It is not related with Stable Diffusion version, and it still keeps being a 2.1 model)
A special Waifu Diffusion VAE provided in Kohya-ss Colab book. Check this link.
KohakuBlueleaf's LoCon extension. You can use the LyCORIS extension if you want, but LyCORIS was giving us some problems with Latent Mode at the moment. We will explain more about that later.
HakoMikan's Regional Prompter. You should be aware of this one. Sometimes AUTOMATIC1111 breaks the latent mode, so, it can cause some problems when it happens, but the author of this extension helps to fix it when it happens.
AnimeSharp upscaler.

#1 - Getting ready our 2.1 Model

RuntimeError: The size of tensor a (768) must match the size of tensor b (1024) at non-singleton dimension 1

This is a common error that many people is getting when trying to use our LoRAs. This problem is generated in the most of cases when you are trying to use the LoRA with a 1.5+ Stable Diffusion version.

It will also throw a similar error when you use use 1.5+ models with 2.1+ SD models (For example with ControlNet).

We will start installing our Waifu Diffusion 1.4 model. It should be ok just dragging and dropping like any other model. You don't need to change anything in your AUTOMATIC1111 installation for now.

After doing this, you need to also install the Waifu Diffusion VAE in the models/VAE folder.

For enabling it just go to Settings > Show all pages and search VAE, switch the droplist. Apply changes and we are done.

#2 - Installing Extensions

Install LoCon and Regional Prompter extension. Ensure to check if both extensions are working with the AUTOMATIC1111 versions you have. If you have an error, check at the Issues section of each extension and check if is something related to the extension itself.

Usually, AUTOMATIC1111 updates tend to break some extensions, so, this is a headache for us.

If you don't know how to install the extensions, copy the links we provided and go to Extensions > Install from URL and paste it in the URL for extension's git repository field. After installing them, go back to the Installed tab and press Apply and restart UI.

#3 - Installing AnimeSharp

We use AnimeSharp for upscaling our images. Without this upscaling, we can't get this image bigger than 768x768. Download AnimeSharp .pth file and paste it in models/ESRGAN folder. Then, reload the UI if you are not getting this upscaler visible in the Hires section.

#4 - Configuring your generation settings

You can play around a little with CFG, Sampler and width/height. but remember to keep width, height, and upscaling proportion with numbers multiplus of 8.

In order to use upscaler and Latent Mode from Regional Prompt correctly and without errors, you must be using numbers multiplus of 8.

This is the configuration we use for most of our images:

Why 1,75 in our upscaler? Easy. 1,75 increases the 512x768 into 896x1344, which are also multiplus of 8, so, it will be ok with the workflow. You can play with the denoising strength too...

Also, don't forget to use change Latent upscaler with AnimeSharp, it helps a little bit with anime pictures.

With this configuration now you are able to generate decent images with our LoRAs. However, we are just in the middle of the road. If you know what you are doing, you can just start from here with no problems. So, the steps ahead are optional, but they helped us to increase a lot the quality of our pictures.

#4 - (OPTIONAL) Prompt Workflow

As we generate single characters in a particular scene for now, we try to keep our prompts as clean and ordered as possible to allow us to fastly change poses and expressions, let's take a look:

1girl, high_quality, 8k, masterpiece, (round_pupil:1.2), vivid_colors, (high_quality_eyes:1.2), 

(luxury_hotel_room:1.2), (bed:1.2), (tv:1.2), (plants:1.2),

looking_at_viewer, (sitting:1.2),

blushed, (naughty_smile:1.2), frown,

(red_frilled_bikini:1.2),

thighs, hips, cleavage, navel,

(small_breasts:1.3), twintails, (red_twintail_ribbons:1.2), (sexy_body:1.2), bangs, <lora:sksnico:0.7>

This is a prompt which will generate a Nico Yazawa using a red bikini. This is part of the Pure Girls Project which involves many members of Love Live!, you can check it in Pixiv if you are interested about the result.

If you see, our prompt has some spaces. Those spaces help us to divide the prompt in the following lines:

General prompt quality: Like, masterpiece, 8k and quality tags which increase the generation process quality
Environment tags: Location, decoration and many other things that are not part of the character and describe the scene itself.
Character pose tags: Composition of the eyes direction, the position and the perspective. For example, Nico looking at the camera and sitting somewhere. You can check more tags at Danbooru.
Character expression tags: Those tags show the expression of the character and their feeling. For example, Nico being blushed and showing us a naughty smile which characterizes her.
Appeal tags: Many tags that describes which part of the body must be shown. For example, increasing a little bit her thighs and hips, and showing her navel (In case of this prompt which is part of the beach scene.
Character attributes: Character-specific attributes and high priority tags for your image. Those are tags that make your character unique. For example, Nico is well remembered for being flat and having twintails. We also add sexy body tag to increase her appeal a little bit in most of the pictures.

This kind of ordering will also help us to achieve faster the next step.

#5 - (OPTIONAL) Regional Prompter

Hako-mikan's regional prompter has two of the most important modifications which we use to achieve the quality of our pictures, it is called Latent Mode an base prompt.

We don't know if there is another way to achieve this without the extension, but, for now, we are just describing our workflow using it.

Regional Prompter changes the way of how U-net calculations are made and allow us to have per-region prompts.

However, we are just using Latent Mode, which changes the way of generating LoRA based prompts in order to allow multiple LoRAs in the prompt, for example, for two characters.

Also, we are applying the BASE PROMPT option and adding an extra quality layer to the image.

You can check how this extension works in the Github README, but, we will explain how our settings are configured to achieve our results:

Divide Mode and Ratio: Horizontal and 1. This is the way to define only a single region instead of multiple regions with different prompts.
Base Ratio: 0.3. You can play with this value.
Generation Mode: Latent. This is the way we increase the quality of the picture, but it will make the generation time slower.
Threshold: 0.4. You can play with this value.
Enable the "Use base prompt" check.

Now, as you have seen in the screenshot, this extension generates us a template of how we need to setup our prompt in order to make it match with the region you defined. It will have ADDCOL/ADDROW tags, but, as we are keeping a single region, they will disappear. Now, it still keeps an ADDBASE tag, which we use to address the base prompt as an additional quaity layer.

In the previous step, we managed to order our prompt in some categories. Now, we will just copy them twice to let Regional Prompter knowing about them, so, the prompt itself will be encapsulated in both base and general prompt.

1girl, high_quality, 8k, masterpiece, (round_pupil:1.2), vivid_colors, (high_quality_eyes:1.2), 

(luxury_hotel_room:1.2), (bed:1.2), (tv:1.2), (plants:1.2),

looking_at_viewer, (sitting:1.2),

blushed, (naughty_smile:1.2), frown,

(red_frilled_bikini:1.2),

thighs, hips, cleavage, navel,

ADDBASE

1girl, high_quality, 8k, masterpiece, (round_pupil:1.2), vivid_colors, (high_quality_eyes:1.2), 

(luxury_hotel_room:1.2), (bed:1.2), (tv:1.2), (plants:1.2),

looking_at_viewer, (sitting:1.2),

blushed, (naughty_smile:1.2), frown,

(red_frilled_bikini:1.2),

thighs, hips, cleavage, navel,

(small_breasts:1.3), twintails, (red_twintail_ribbons:1.2), (sexy_body:1.2), bangs, <lora:sksnico:0.7>

Look how we duplicate all the categories of the prompt except the character attributes. You can also include some of them. but never include the lora in both of the pompts.

Now, we can just generate and get our images looking like the anime style. It has been a long way to get us here, trust us. For now, you can have a well-looking base, and we will share you some comparisons between Regional Prompter method enabled and disabled with the same character:

#1: Shioriko Mifune without Reg. Prompt. - Shioriko Mifune with Rg. Prompt

#2: Maki Nishikino without Reg. Prompt. - Maki Nishikino with Rg. Prompt

#3: Emma Verde without Reg. Prompt. - Emma Verde with Rg. Prompt

For now, this is how it works. If you want to add, ask, correct or explain into detail anything, just write us, we will be happy to help :)