Outfit LoRA when only 1 character wears the outfit in training images
I'm trying to make an anime LoRA/Lycoris model that captures a particular costume that a certain character wears. Let's call it Outfit X. I want to be able to use this LoRA in combination with other character LoRAs to make those other characters wear Outfit X.
The problem is that Outfit X is a design that comes from a particular character (let's call her C). I have about 55 carefully cropped and tagged images C wearing X, and I could possibly obtain more. However, I could only find TWO anime images of Outfit X being worn by someone other than C. So my LoRA is having trouble separating out the concept of Outfit X and Character C.
I did take some steps in image tagging to try to mitigate this problem. I created two new alphanumeric trigger words, (call them Cch4r4ct3r and X0utf1t) to try to separate out the concepts. My tagging method for training images was this:
Each time Character C appeared in Outfit X, I tagged physical description details about C (e.g. blonde hair, brown eyes, ponytail, hair ornament) but did NOT tag details that were essential to Outfit X (e.g. blue shirt, yellow pants), although I did tag some of the accessories worn with the outfit since I might want to remove in image generations. I then included BOTH the Cch4r4ct3r and X0utf1t tags.
Sometimes the image of C wearing X did not depict C's face, either because the image was from behind (showing back of head) or because the head was out of frame entirely. In these cases, I tagged all of Character C's physical features that were visible in the image (e.g. blonde hair, ponytail) and again did not tag details about Outfit X. However this time I included only the X0utf1t tag (not the character tag since her face wasn't visible).
I used the two images of other characters "cosplaying" in outfit X to generate a total of 5 training images (some upper-body 512x512, some full body 512x768, and one lightly edited). Each of these was tagged with full physical description (e.g. short hair, red hair, green eyes), no outfit description, and the X0utf1t tag.
This resulted in a LoRA that is good at producing images of C wearing X but bad at producing images of other characters wearing X. Yes, I can change the hair color by adding e.g. "black hair" to the prompt, and I can usually remove the character's ponytail by putting "ponytail" in the negative prompt. However, all the images I generate tend to share Character C's other less-easily-describable characteristics, like her facial structure and a certain way the hair looks that can't be described with a well-known tag. I also struggle to remove her prominent hair ornament even when I put "hair ornament" in the negative prompt (despite the fact that I included "hair ornament" as a tag in all training images where it appeared).
This means that when I combine my LoRA with another LoRA for some other character D, I struggle to produce an accurate image of D wearing X. Usually she looks like a fusion between C and D.
My main question is: How can I improve my LoRA's ability to transfer to other characters?
Perhaps this is not possible without more images of other characters wearing X. But I had a couple ideas based on the fact that there are lots of images of character C wearing various outfits other than X:
Download, crop and tag a bunch of images of C wearing other outfits. Since I'm not trying to train those other outfits, I could probably just describe them all (i.e. use "brown shirt, red pants, boots" instead of inventing a new tag like Y0utf1t) and include the Cch4r4c3r tag. This would take a lot more time, though.
Download a ton of images of C wearing other outfits and use them as regularization images with just the tag "Cch4r4ct3r" (I have never tried using regularization images before).
Please let me know if either of these ideas makes sense, or if there's a better method for this scenario that I'm missing!
A couple other related questions about outfit LoRAs while I'm at it:
A. What kind of model should I use for an outfit? I've been told that LoCons are good at capturing fine details, but I've never tried them. My current LoRA seems to be good but definitely could use some improvement.
B. What kind of "class tag" should I use for an outfit? I've been told that adding a class tag can improve performance, but I don't think "outfit" is a standard booru tag, and I don't know of any standard tag that's generic enough to include the type of outfit that X is. Or does this not matter?
Don't use special tags; I know there are a number of guides that suggest this, but it's totally unnecessary, and removing the normal tags results in a less flexible model, which is where you're running into your particular problem. The AI associates what it's learning with what it already knows. If your training tag for the outfit is simply 'Outfit X', this is new to the AI, so it doesn't know how to separate this concept from the character without seeing other characters wearing 'Outfit X'. On the other hand, if you have something like '1girl, blonde hair, blue eyes, etc' wearing 'crop top, microskirt, highleg panties, etc', those are ideas the AI is already familiar with, so it's better able to pick these things apart. Then, in the end, in order to have another character wearing 'Outfit X', you would prompt the appropriate LoRAs, as well as the tags for 'Character D' and 'Outfit X', but not for 'Character C'.
TLDR—tag your dataset as if you were uploading it to a booru.
As has been said before: Don't use custom tags. It's so much worse than using the booru tags. Automatically tagging the images and removing everything that refers to the outfit should be enough. (Activation tag is another story. Normally models work without but otherwise put it at the front)
I also recommend to train the model slowly (1e-4 with 200-300 image * repeats per gen). That way you are far more likely to find the specific gen that consistently creates the correct outfit while still allowing you to change other details.
Note that I don't have experience with costume LoRA but this works with character LoRA where all the images in the dataset had the same strong artstyle.
Some more things you could try are: put more weight on the images without character C. Create a first model and generate images until you have a few without character C. Incorporate them into the training data and make a new model with more character variance.