Sign In

Using shoes to determine needed number of images for a LoRA

Using shoes to determine needed number of images for a LoRA

In my earlier training guide I remarked nobody (that I’m aware of) had done a test of training the same LoRA subtracting images in training data each time to figure out what is really the minimum. I still don’t have that, but I have realized (actually I've been sitting on this 75% finished for at least a month) that I do have a good test of LoRAs with limited data sets already: Footwear!

How does footwear tell us how many images a LoRA needs? It’s actually quite simple: For most LoRAs, the majority of training data is of the upper body, so the shoes are actually going to be undertrained a lot of the time. Additionally, tagging for footwear is much less developed than tags for outfits, so those aren’t going to do the heavy lifting for most characters. Finally, shoes are often the most inconsistent part of character even when they are in data set, so the data is often contradictory in otherwise good art. Since so many of my character LoRAs are for fairly obscure characters where the data set is literally every pic I could find, I’m not going to feel ashamed to admit the character’s shoes aren’t as consistent as I’d like if just getting a full data set at all was impressive. I don’t recall (but may have on something) ever using flip augmentation on my character LoRAs (most characters I do have at least some asymmetry in the hair), so I don’t need to account for that.

I have gone through my previous character LoRAs and their TD to list the following information
Complexity: How complex/unique is the character’s footwear? A subjective rating of 1-5.
Consistency: Is the character’s footwear consistent in the training data? Only footwear that needs the same tag is counted (so one-off swimsuit pics with sandals instead of the character’s normal boots is just counted as not having the shoes in them). Another subjective rating of 1-5.
Count: How much and what percent of the training data has the footwear present? Does not count socks alone being visible.
Tags: How many tags are used for the character’s footwear? Purely supertypal tags are not counted (e.g., “boots” doesn’t add to the count if “ankle boots” is already used). The number following a + is the tags for a character’s socks (including pantyhose, tights etc. for this purpose)
Result: How do I feel about the LoRA’s ability to generate the character’s shoes. A subjective rating of 1-5 stars.

Ranking LoRAs

Example:
Usagi Kurokawa
Complexity: 1/5. Minimal complexity, just black sneakers, white tip, soles, and laces.
Consistency: 5/5. Very high: Tip size varies slightly between artists which is nitpick level.
Count: 14/24 . Exactly half.
Tags: 2. One defining type (sneakers) other defining color (black footwear)
Result: 5/5. As expected of simple, consistent design with lots of training data, Usagi’s shoes come out consistently.

Now the rest of the characters in order of creation. I’m going to exclude those made entirely from comic based data sets because I didn't follow my normal data set steps for those.

Ella
Complexity: 2/5. Not trivial to describe, but very mundane shoes I’m sure an expert on shoe terminology could describe very well with text alone.
Consistency: 5/5. Since all but one piece of training data with shoes is official art or a pic of the 3D model, there’s no variance at all.
Count: 14/18. Even have a good pic of the soles in there.
Tags: 1+2. “red shoes”+length and color of socks
Result: 5/5. The overall LoRA is a bit stiff but entirely functional, and the shoes aren’t the issue at all.

Karis (1.0):
Complexity: 3/5. They’ve got some very specific features, but nothing outrageous.
Consistency: 4/5. The boots often blur the line between strapped boots and puttee over boots, but they are mostly consistent in that ambiguity. Two pictures (one from manga, one from a promo art holding her staff in front of her) are unambiguously puttee over boots now that I look at them for the purpose of hyper examining the shoes, with no room for doubt.
Count: 9/21. Some are mostly obstructed showing only the top part though.
Tags: 1+1. White boots+black pantyhose
Result: 4/5. PDXL has decided they are 100% boots without puttee and rejects the ambiguity, but it matches most of the data.

Misa Satsukino
Complexity: 1/5. I was going to say two stars for complexity, but I accidentally genned a pic without the LoRA loaded and came out very close except the toe.
Consistency: 5/5. Hard not to be with such limited data, all from official sources. Now that I actually think about it, I’m not sure she even had shoes as part of her design before the OVA.
Count: 4/59. Lowest percent yet. In retrospect I should have split the character sheet into three pictures, one for each direction, but too late now (will fix for ILXL ver). It may help that the shoe pics it does have are from multiple angles and close ups.
Tags: 2+2. Color+style for both
Result: 3.5/5. Tip and nature of ankle isn’t consistent, but overall works, especially with how limited the data is.

Tara Grimface
Complexity: 4/5: Some unusual features, but nothing one would call overdesigned
Consistency: 5/5. All but 2 pics with shoes from the same 3D model and both were official art.
Count: 14/29. Bunch of full body pics from different angles.
Tags: 2. Boots+black footwear.
Result: 5/5. Another victory for high data count

Misaki
Complexity: 3/5.
Consistency: 4/5. The overworld model and pre-battle model actually differ on the nature of the top part (much closer to body on overworld), so we’ve official sources differing in the data set on a major detail.
Count: 6/26. Multiple directions, however all but 2 pics are from low poly DS models.
Tags: 1+3. Well described socks, but just used “boots”
Result: 4/5*: It will generally draw them matching what you want, but there’s a huge asterisk here: The footwear will often be overwritten by other LoRAs (even style ones!) due to simple tagging. This normally wouldn’t be an issue, but due to the undesirable art styles for most of the data set (simple sketchs and low poly models) this one really wants to be paired with a style LoRA.

Ellie
Complexity: 3/5. Couple of strong details, but nothing strange
Consistency: 2/5. The cutscenes don’t have the same proportions as the still art, and they’re missing the white heel. The one full body art piece of the GG version doesn’t look anything like the Saturn version except the cuff.
Count: 6/23. In addition to the non-matching GG art, 3 pics are from the hyper compressed cutscenes and from a distance, which can’t be good for the quality.
Tags: 1+1. One that defines type and color, pantyhose type and color.
Result: 1/5: Very inconsistent.

Pratty
Complexity: 4:5 The buttons/orbs/whatever they are a bit strange, but not overdesigned.
Consistency: 4/5: What the spheres are isn’t consistent and several pics are angled to hide the one on the topside
Count: 7/24: Decent number.
Tags: 2+1: ankle boots+color. In retrospect I could have added “fold-over foots”
Results: 2/5. You can see shade of the boots she has in training data, but it’s generally not the same thing.

Razzy
Complexity: 3/5 The belt is a strange feature, but not a complex one.
Consistency: 4/5. Some deviantion within fanart.
Count: 7/20. Also a pic of her in her dress with a different pair of boots.
Tags: 3+1: Color, length, and feature (fold-over) as well as black socks
Results: 3/5. Leave something to be desired in consistency, but their core does appear.

Sanary
Complexity: 3/5 because of the ring on the top front. Otherwise 2 or 1.
Consistency: 5/5. It turns out all the pics of Sanary’s boots I have that aren't official art were from a single artist.
Count: 7/22. One is from behind and doesn’t show the ring and another is very chibi and low res.
Tags: 3. Color, length, and feature.
Results: 4/5: It gets it right most of the time, except its either missing that ring or thinking its a ribbon.

Rifmonica
Complexity: 2/5: Very simple shoes with one unusual detail (the white ring tongue)
Consistency: 4/5: Two of the pics are totally wrong on the shoes, otherwise all good.
Count: 11/30. 4 are official art.
Tags: 1+2. They’re just “pink shoes”
Results: 4/5. The only part that isn’t consistent is that tongue.

Murno
Complexity: 1/5. Like Rif, but without the tongue
Consistency: 5/5. Not as good as it seems because...
Count: 2/21. Technically 3, but that one is so chibi as to be useless.
Tags: 1+3: White shoes
Results: 3/5. Better than you’d expect, but a glaring issue (albeit one that could be fixed).

Rufeel
Complexity: 2/5. Simple shoes, with a small embelishment.
Consistency: ???. See count.
Count: 4+6. I’ve got 4 good quality ones, but 6 very chibi ones with no detail.
Tags: 1. This one was chosen poorly.
Results: ???. This one is weird. The detail is correct, but because of the tag chosen, they incorrectly appear backless because the tag I used was slippers. I suspect it wouldn’t have that issue if I used a different tag (I’m going to swap the tag when I retrain for ILXL).

Mana
Complexity: 1/5. Very simple.
Consistency: 4/5. Only some chibi pictures aren’t on model
Count: 6+2. A couple of chibi shoes, but also pics from every angle
Tags: 2+1. Color and type of shoe.
Results: 5/5. Only issues are checkpoints trying to make the shoe more complex than it really is with a added metallic buckle

Orochi (Orochi and Rin)
Complexity: 1/5 and 2/5. Orochi’s shoes are just basic, black shoes. Rin’s shoes have a small embellishment on back.
Consistency: 5/5 and 4/5. The one non-official pic of Rin I have misses the embellishment but is otherwise on model.
Count: 4 and 3. Not a lot for either.
Tags: 1 and 2+2. Orochi is just “black shoes” because they’re so simple it’s hard to describe them. Rin is type+color on both
Results: 4.5 and 4/5. Orochi needs laces as negative tag, and Rin’s exact colors/back of shoe aren’t consistent but very close.

Karin
Complexity: 4/5. Not over designed, but pretty specific details
Consistency: 2/5: The official art piece of her is the sole authoritative piece on what her shoes look like (the in-game sprite is way too tiny to tell), and it’s vague/crude enough everyone varies on the details. The rough shape is consistent though.
Count: 5/19: This character didn’t have a lot of TD and I only made her because the rest of her outfit was pretty simple.
Tags: 1. Just “shoes”
Result: 3/5: Given the issues mentioned above, I’m surprised they come out as consistent as they do, even if it’s only mostly resembles the official art.

Sheba
Complexity: 3/5. A seemingly simple design with two very specific details (the point and cylindrical/tubular top part)
Consistency: 3/5: The point is applied in only one piece of fan art (though many aren’t at an angle/camera to even show it). The general shape of the upper part is generally applied, but with inconsistent details (flare, exact transition between the two parts).
Count: 8/24, but 2 of them barely show the boot
Tags: 2. Kneeboots and Brown Footwear
Results: 3/5. The inconsistent joining of the two parts really hurts here. The point will sometimes apply, but not consistently.

Dagoth Ur
Skipped: He doesn’t wear shoes, so I can’t rate his shoes.

Azlier Levinos
Complexity: 5/5. Very specific style of fantasy armor. Were there any details on the leg armor itself, I’d give it a 6.
Consistency: 4/5. As well as you could hope with this level of detail
Count: 7/39 (2 chibi), plus 2 or 4 more that only show the top part of the greaves.
Tags: 2. Greaves and metal boots
Results: 4/5 on the metal boots themselves (knee and under), 3/5 on the thigh greaves, which were the least consistent part of the training data.

Rena
Complexity: 4/5: The individual pieces aren’t complex, but their combination is.
Consistency: 4/5: Surprisingly good on footwear. A few are missing the ankle ribbon and pouch or drew it weird.
Count: 8/29.
Tags: 2+6: A rare case where the footwear has lots of well defined parts . (Note: I did not include the tag “asymmetrical footwear” in the training because I didn’t register the ribbon and pouch as covered by the tag)
Result: 4.5/5. The only issue she has is the asymmetrical parts don’t consistently have proper asymmetry. This can be chocked up to my mistake of not including the asymmetrical footwear tag (which I’ll correct in the ILXL conversion)

Atsuko (main outfit and kimono)
Complexity: 2/5? Low res, low detail and blurry, 1/5 on alt outfit’s sandals
Consistency: 4/5. Consistently awful, with one shot (when she’s inside the castle) looking like there’s supposed to be more detail on the front but it’s too much of a mess to tell. One shot of the front. 5/5 for alt outfit
Count: 4/36, 2/36 (which are near identical) for the sandals. Also there's one picture where she wears her main outfit and has socks on without shoes.
Tags: 1. Sneakers. 1+1 (white socks+sandals)
Result: ?/5. Look like base model’s understanding of sneakers with slight preference toward color. I can’t actually judge this. I have no idea what they’re supposed to look like. The sandals also appear to be what they’d look like with no LoRA, except it captures how thin they are and you can tell there isn’t any detail you or the LoRA missed, so 5/5 there.

Karis (V2)
Complexity: 3/5. As before, they’ve got specific features, but nothing truly complex.
Consistency: 4.5/5. The extra images I added all go toward them being just boots
Count: 16/32. Not necessarily all fully in view.
Tags: 2+1. White boots+knee boots+black pantyhose
Result: 5/5. Consistent with itself and a reasonable match.

Karst
Complexity: 2/5. Standard fantasy boots except for the spur holder thing.
Consistency: 4/5. It’s unclear in the official art piece if the band at the top is a fold-over boots or extra layer of material at the top. Fan art prefers the extra material idea, but otherwise it’s hard to contradict the details.
Count: 11/30, but only 7 show beyond the top
Tags: 2 or 3, depending on if we count the gap between the boots and skirt as a tag about the boots.
Result 4/5: Almost all there. The top could be more consistent. Tempted to add fold-over boots and spurs to ILXL version’s tags.

Alma (uniform and pilot suit):
Complexity: 3/5 and 2/5. It’s more the details of her uniform shoes than the shape that’s complex.
Consistency: 5/5. Except for some fanart of the pilot suit (on model and added for the purpose of getting the footwear right) everything was official media.
Count: 8 and 5 of 60 (and many were partially obscured).
Tags: 2+2 and 2. Note the shared tag for distinct pieces of clothing.
Result: 3/5. They do OK. They often come out as a 4/5, but it’s too inconsistent.

Mia (uniform and pilot suit)
Complexity: 2/5 and 2/5. Her uniform boots are slightly less complex
Consistency: 5/5. Like Alma, all official art or on model fan art.
Count: 5 and 5 of 46. Like Alma the uniform shoes are obscured in a lot of pics.
Tags: 2 each. Ankle Boots and color.
Results: Low 3/5 on both. The two different boots blend together too much due to lack of a distinguisher like Alma’s socks.

Helena (uniform and pilot suit)
Complexity: 2/5 and 2/5. As above.
Consistency: 5/5. As above
Count: 6 and 1 of 46.
Tags: 2 each. After the above two I changed the normal uniform to plain “boots”
Results: 4/5 on main (1/5 without lace-up boots as negative), 2/5 on pilot suit boot (surprisingly. I think part of it is pilot suit is in base model).

Lilith
Complexity: A strong 3/5. They’ve got distinct sci-fi elements in how the two parts meet.
Consistency: 4/5. One shot is a distance shot with low detail.
Count: 3 of 22
Tags: 2. Color and boots.
Results: 2/5. It’s close but the details are lacking. Length being a big issue

Conclusions

Simple costume pieces with a basic tag can be taught (at least to PDXL) to a reasonable degree in as little as 3 good pictures. The most complex outfit pieces/details want 8+ at the high end.
Low consistency on training data is an absolute killer. It may be worth cropping out such improper details when possible over mistraining them.
Tagging things as much as you can is a benefit.
Use more specific tags when they’re available and applicable. The details without tags are least likely to be learned and, besides any help in learning, using just common supertypal tags like “boots” are likely to be overwritten/hybridized with the examples in style LoRAs.

4

Comments