santa hat
deerdeer nosedeer glow
Sign In

OC LoRa Part 3 - ControlNet Augmentation

OC LoRa Part 3 - ControlNet Augmentation

Normally, I would have published this earlier but I was fairly busy with other things and didn't get around to writing an article. Just a warning that there might have some details that I missed out due to the late writeup but you can check out my dataset and LoRA metadata for any missing details.

General Disclaimer: This is a log of my discoveries while training these LoRA. This may or may not work for you. This more intended to make consistent characters from an AI generation rather than making an image of an existing character. I am not sure if things in this guide could be applied to other character LoRAs.

Lowset LoRA/ Single Image LoRA

This is a series where I post my personal findings with training a LoRA with a low or practically non-existent dataset to see what I can come up with. Future posts are mostly likely no longer single image but I am still looking for a minimum effort to retain as much detail as possible while maintaining a flexible LoRA.

The prior articles found be found below:

  1. Part 1

  2. Part 2

IsabellaV4 Augmentated

I should have written an article about this LoRA a month ago but I was backlogged. I tried playing around with the advanced kohyaSS settings at first but didn't get any good results. A lot of the settings seemed to cause the LoRA to undertrain so it wasn't very productive with toying around with all of the settings.

So in order to make better use of my time, I decided to try a more novel approach. Specifically, I used a variation on AI_Character's approach for training LoRAs in XL.

The short explanation of the approach is to teach for style instead of using regularization images.

Luckily, my regularization images have the same artstyle as my base image so I just moved my regularization folder into the main dataset.

In the main dataset:

The main benefits for this approach is that it seems to help improve character likeness around the face and eyes along with complex poses such as from_side. Regularization images seem to always cause the eyes to underfit and retain the original eye styling of the checkpoint which may or may not be something that you prefer. Personally, I like to have eye shapes closer to my original dataset so I prefer this approach over regularization images. In addition, it seems like teaching for style also helped the LoRA learn about general anatomy, allowing the character to be rotated instead of being fixed in a single pose.

Caveats

I experimented with this approach for some other LoRAs and have found some issues that require some level of caution. First, I don't recommend that you add hand poses to the style datastyle since it seems that if too many similar hand poses are in the dataset, the LoRA seems additionally learn the hand pose as well. Captioning doesn't seem to help with this from my experience. It appears if the LoRA sees too much of a certain concept, it will become overfit. Second, be careful of the lighting that appears in the dataset. I had one case where the dataset was too bright and that carried over into the LoRA. Luckily, I adding lighting captions seemed to help remove the effect. Third, try to keep the hairstyles relatively similar to your character. I had a case where I used a LoRA to help generate a style dataset but then certain aspects of the side hair starting appearing in my generations.

In summary:

Benefits:

  • Better likeness in face and eyes

  • Better likeness for posing from_side

  • Allows LoRA to learn poses

Downsides:

  • Increased Style Bleed

  • Can unintentionally learn things

Captioning the style dataset:

For the kohyaSS training, I have my keep n tokens settings at 4. This is more in part of using multiple trigger words for outfits and keeping different styles in the dataset. Currently, the style dataset is captioned as "flat_color ,anime_screencap, style, ascartstyle" as the main trigger words. The most important one is 'ascartstyle' which is a custom keyword to help the LoRA learn about the style. The other three are just keywords to help describe the style. I seem to have better results with my custom keyword as the 4th one although I have no idea how to explain this occurrence. However, the key takeway from style training is that you should not have keywords such as '1girl and solo' as the trigger words for style since this will cause the character details from the style dataset to bleed over into the LoRA.

ControlNet Augmentation

I noticed that after reaching about 6-10 images with the same artstyle in the main dataset, style begins to bleed over too much. I have had some fairly bad results where I added too much of the same image into the dataset and that caused the end result to warp or have very bad lighting effects. In addition, having more image close up is fairly crucial for increasing character details but this poses a problem where I need to sacrifice fidelity over flexibility. Toying around with training settings did not seem to help so I resorted to using ControlNet to create style variations of my dataset.

I used an upper body shot since Stable Diffusion generally struggles with full body images unless the resolution is high enough but that is very time-consuming. Unfortunately, it's not an one-shot process as I had to cherry-pick the proper images and coloring from the ControlNet generations. Adding images with incorrect colors will usually cause the LoRA to learn the wrong colors. Quite interestingly, the lower body shots do not transfer style well.

(Left Original Image | Right Augmented using ControlNet with Incorrect Coloring) I recommend downloading the training data to see what I used. There is some level of leniency but it's hard to pin down what rules you can bend.

)

ControlNet Style Settings

There isn't a set guideline for the weights since it can differ per checkpoint. Regional Prompter can help with getting some of the colors correct but it's not 100% accurate.

Checkpoints used:

  • Aurora

  • BlueMix

  • AnyLoRA

  • DarkSushiMix

  • WintermoonMix

  • BreakDomain

  • AbyssOrangeMix

The idea is to use a variety of mixes with differrent artstyles.

ControlNets:

  • ControlNet Canny:

    • Weight: 0.2 ~ 0.5; End Guidance:0.5

  • ControlNet Lineart + AnimelineDenoise:

    • Weight: 0.2 ~ 0.5; End Guidance:0.5

  • ControlNet T2I Adapter:Color:

    • Weight: 0.5; End Guidance:0.5

I am using lower weights and and earlier end guidance value since I noticed that weight of 1 doesn't seem to allow for ease of style transfer. I don't recommend using reference_only here since that ControlNet tends to cause a lot of style bleeding.

Impact:

  • There is an noticeable impact on reducing style bleed but some style bleeding does remain. (The super lighting effect seems to have vanished)

  • Overfits on the torso

  • Requires lower weight at 0.5 ~ 0.6 to work well.

  • The LoRA is still workable but requires more effort on the prompt engineering side.

  • I have made some cursed images with torso with boots.

  • Better detail capture

  • reduced overfitting on eye position (might not apply for every case)

Personally, I see that adding ControlNet augmentations to reduce style bleeding could work but it seems like in order to reduce style bleeding further, I would need more viewpoints of the character.

ControlNet for Outfit Augmentation

One of things that I noticed with a lower dataset is that the torso and headwear is often overtrained and is generally very difficult to remove via prompting. So, in order to make a low dataset LoRA become flexible for outfits, I will need to create images with revealing clothing for my character. Fortunately or Unfortunately? it is very easy to remove clothing from characters by using MultiControlNet. I wrote another guide explained in detail here. For my dataset, I added images of my character in underwear and swimwear. I speculate that in order to get clothing removable you probably want something that exposes the shoulder and arms at a minimum.

When removing hatwear or accessories, pay attention to the blurred discolorations caused by lama cleaner. If the discoloration is in the hair, the LoRa could accidently perceive that your character has multicolor hair and add it to the end result. I recommend running multiple generations until you get something where the hair isn't too discolored by the ControlNet result.

Personally, I found that adding three images was enough to get LoRA flexible enough on changing clothing but occasionally, the hat would underfit for some reason. I found that it's rather easy to verify outfit augmentation results since AbyssOrangeMix tends to overfit on torso and headwear.

Training Settings:

Repeats:

  • 1_style

    • "flat_color, anime_screencap, style, ascartstyle" as trigger set

  • 5_isabella (skirt and underwear augmentations)

    • "isabellanorn, 1girl, alternativeoutfit, solo" as trigger set

  • 10_isabella (original image and upper_body augmentation)

    • "isabellaNorn, generaloutfit, ascartstyle, 1girl" as trigger set

Personally, I would have have placed the upper_body augmentations with lower repeats but I had to remove several images where the coloring was incorrect so I placed the smaller dataset in the folder with higher repeats. "generaloutfit and alternativeoutfit" are trigger words used to help the model know the difference between multiple outfits. "ascartstyle" is more of an attempt to try and soak up the artstyle of the LoRA, I don't use it at all in my generations. I have tried placing it in the negative prompt in hopes of reducing style bleed but that didn't have any impact.

Kohya SS settings

  • Scheduler: Cosine with Restarts

  • Optimizer: AdamW8Bit

  • LR: 0.001

  • Unet: 0.001

  • Text: 0.0005

  • Removed Block Weights

Other Attempts:

  • Lowering LR: Drop in detail quality but more flexible

  • 32/32 Network Rank: Latent Noise (Complete failure)

Things yet to try:

  • Lower LR + 32/32 Rank

  • Negative Concept Training (Training the LoRA for a concept to be specifically used in the negative prompt)

Overall Results:

  • Easier to change outfits

  • Better hairstyle capture with custom outfits

  • Original outfit is harder to prompt for

  • Hand position is overfit

  • Requires lower weight

  • Eye style likeness is similar to the dataset

  • Stripes on original outfit is occasionally incorrect

  • Lower style bleeding (Still present on some checkpoints)

  • Better detail capture

Other Observations:

  • Doesn't seem to understand proportions yet due to lack of a full body view without any clothing?

  • Doesn't seem to understand asymmetry

Concluding Remarks

I am probably finished with my Isabella LoRA until I can find a solid method to create a side view without too much editing or inpainting. Personally, I'm at the limit of using a single viewpoint and ControlNet augmentation seems to require multi-viewpoints to help reduce the style bleeding effect. From my second OC test, I found that it is possible to make a fairly accurate and flexible LoRA for a character with a simpler design although it still has it's own problems.

Thanks for reading and maybe my musings were insightful.

References and Other Links:

18

Comments