Lowset LoRA/ Single Image LoRA
This is a series where I post my personal findings with training a LoRA with a low or practically non-existent dataset to see what I can come up with. Future posts are mostly likely no longer single image but I am still looking for a minimum effort to retain as much detail as possible while maintaining a flexible LoRA. The overall goal is to make new consistent characters from generations.
As a general disclaimer, these findings may or may not work for you.
The prior articles found be found below:
Turnaround
In this article, I will explain the training process for my second OC character, Enna. The overall design of this character is more simplified to see what kind of mistakes or errors that I can get away with while training it for the first time. The training settings and captioning style are still the same the previous articles.
Dataset Preparation
In this experiment, I created a turnaround using openpose and ControlNet. The turnaround consists of a full body front view,back, and both sides. Stable Diffusion is not able to create a fully consistent turnaround so I had to rely on img2img, and inpainting in order to have a consistent design. I will not be explaining the editing process to get the turnaround consistent since that's very reliant on your art skills/generation skills.
Afterwards, I upscaled the turnaround using MultiDiffusion and TiledVAE to about a 2k resolution. The key here is that you want a high enough resolution where you can add close-ups of any desired details without having jagged pixels when you zoom in a bit on the image. This doesn't mean that you should use a high as possible resolution for your images since it is mostly limited by the image resolution value used in kohyaSS. Having an image resolution beyond this value doesn't offer much benefit. (Image resolution used during training is usually 512x512 or 768x768; In my training, I use 768x768 for more detail capture. An overly high value tends to cause out of memory issues). You do not need to resize your images if they are larger than the training resolution.
Afterwards, I used the Character splitter tool from deepghs to automate getting a portrait shot and upper_body shot for my images.
Full Body View. See the Training Data set for other view points
Other Dataset Images
I used controlnet augmentation and style training method as described in part 3. The captioning and repeats are slightly different. I use "keep n tokens" at 4 so I need to use 4 trigger words for my images.
Captioning:
For the base image:
ennafol,generaloutfit,ascartstyle,1girl
placed in 7_enna folder
For controlnet augmentations for style:
ennafol,generaloutfit,otherstyle,1girl
placed in 2_enna folder
For controlnet augmentations for alternative costumes:
ennafol,otherfit,ascartstyle,1girl
placed in 5_enna folder
For style trainings dataset:
flat_color, anime_screencap, style, ascartstyle
Overall structure:
1_style
2_ennafol (style augmentation; kept low to prevent learning wrong details)
5_ennafol (clothing)
7_ennafol (original images)
Autocaptioned with WD1.5 and then manually captioned for more words. When it comes to the style augmentation, I did not prune words related to eye color and hair color but I did prune those words in the folder with 7 repeats. I recommend downloading the training data to see how I captioned my dataset.
Training Settings:
Training settings are the same the V4 version of Isabella. However, the 1.1 of the Enna LoRA uses a lower training rate at 0.0001, 0.00005 text encoding, and 0.0001 unet. About a tenth or an extra zero added to the training values.
Optimizer: AdamW8Bit
Scheduler: Cosine with Restarts
Keep N tokens: 4
LR: 0.0001
Text: 0.00005
Unet: .0001
No augmentations enabled
Observations:
Version 1.0 for Enna
A lot of concept bleeding with keywords such as behind,back,side. It caused the LoRA to always show the back view whenever it is mentioned. Adding more captions didn't seem to help.
Side view seems to have learned the side image from the dataset fairly well. The side view in my dataset isn't accurate but it's good to know the LoRA did retain the details well enough.
Color bleeding with blue and purple roses. At first, I believed it was an issue with the captioniing but after more testing, it seems the LoRA mixes these two colors frequently. It's potientially due to the fact that my character has purple eyes which could be causing a bleed effect.
Character is rotatable and outfits can be easily swapped
Doesn't seem to understand proportions well enough. Usually in my generations, I always get an older looking version of my character. She's more on the petite side rather than being tall. The issue is more apparent with custom outfits.
Version 1.1 for Enna (Lower LR)
Concept bleeding issue with back views was resolved
Side view is not accurate with training data but the generated side view is more coherent with the original look
Still easy to swap outfits
Proportions issue is still apparent but can be fixed by proper prompt engineering. It won't work with every checkpoint
Observations during Generation
Trigger Words vs Trigger Set:
This isn't discussed as much but the captions used in the training set act more as words in a trigger set to help the LoRA generate your character. For the original outfit with the character, it's best to use 'first n tokens' and refrain from using too many words from the the trigger set that was used in the training data. However, for custom outfits, it's best to use more words from the trigger set to increase probability for better likeliness. This somewhat implies that the person with access to the original dataset has better knowledge of how to use the LoRA although it's possible to get by with the keywords found in the LoRA's metadata.
Random Thoughts
LoRA seems to underfit and produces an older look when using from_behind, and bare_back
applies for some checkpoints and not all
Adding hair color and eye color towards the end of the prompt can somewhat help with color bleeding
I had to avoid words like shirt and skirt and prefer words such dress, and white_dress to get the colors matching for my character
oddly, the LoRA would try to produce a black skirt fairly often
Quite the number of checkpoints try to make my character older and taller than she normally is (this just suggests that the LoRa is underfit on proportions):
words such as flat_chest and petite helped
Works on my original checkpoint but I have to use LoRA block weights since my personal checkpoint is just a chaotic mix
Didn't find an instance of hatwear concept blend
LoRA seems to like adding a collared shirt
Still able to swap dresses without seeing concept blend of the original outfit
Back side view of the coat is occasionally hit and miss
Prompt Weighting can reduce some overfitting and underfitting issues but a lower weight at 0.2 seems to force the LoRA to generate blurry images
Seems to work best at 0.6 ~ 0.8 weight but 1 works well although there is some overfitting on style and hand posture
Things to try for Enna:
Check for impact of other outfits added to augmentation
Full body view of other clothing
I don't think there's much room left to improve for this LoRA.
Conclusion
EnnaV1.1 basically achieves everything that I wanted in a LoRA minus the proportions issues so I have partially achieved my goal with a low LoRA dataset. The one caveat with Enna is that I believe this only works since her character detais were very simple and I do not expect this to work with a more complex character design. I will spend some more time in the future using a more complex turnaround to see if these findings will hold true. Well, thanks for reading!
l