Sign In

OC LoRA Part 7: Pupil and Lower Body Training and Impacts of Inpainting and Captioning

OC LoRA Part 7: Pupil and Lower Body Training and Impacts of Inpainting and Captioning

This is the seventh article detailing my training experience using synthetic data for an original character. The main goal of this character design was to try apply all of the challenges I learned with a complex character on top of an unique pupil design. From my experience, most character LoRAs on civitai often fail at capturing pupil details so this can be considered as a difficult concept. I will eventually try out more complex designs so this is only the first step in the process.

As usual the general disclaimer:

Lowset LoRA/ Single Image LoRA

This is a series where I post my personal findings with training a LoRA with a low or practically non-existent dataset to see what I can come up with. Future posts are mostly likely no longer single image but I am still looking for a minimum effort to retain as much detail as possible while maintaining a flexible LoRA. The overall goal is to make new consistent characters from generations.

As a general disclaimer, these findings may or may not work for you.

The prior articles found be found below:

  1. Part 1

  2. Part 2

  3. Part 3

  4. Part 4

  5. Part 5

  6. Part 6

Character Design

Design process was very similar to the Elris LoRA where I started with a 1024x1024 generation and then outpainted the rest of the body. I initially started off with a bikini design since I was relying on sketching over pure generation at the time. This isn't necessary as you can always use ControlNet's generative fill to strip your character for dataset augmentation. For more complex designs, outfit augmentation via skimpy outfits is highly recommended. A simpler design is generally easier to use with img2img style augmentation.

Unique Details:

  • Quarter Rest Pupils

  • Treble Clef Accessory for Beret

  • Music Note Accessory for neck

  • Hollow Leaf Design on the left side

  • V shape shirt design

  • Hair tuft on the right side

  • two pairs of floating strands

  • Black and white skirt design

  • Boots with a v top cut

More details compared Elris to see if all bases are covered for capturing every character detail.

Initial Dataset:

  • Full body image of original costume

  • Full body image of character in bikini without any headwear

Style Augmentation

I took a slightly different turn with the augmentation due to the character's unique pupil design. The process is the same as my Elris LoRA but I used face censorship to prevent the LoRA from learning the wrong pupil design. There shouldn't be too much of a focus on the style augmentation as getting rid of style bleeding completely is very time consuming and in most scenarios it is better to use a style LoRA instead. The main benefit of adding some form of style augmentation is so that the shading style isn't overtrained which can cause background details to be overly simplified.

Camera Augmentation

  • Crop ups of Portrait, upper_body,cowboy_shot were added. Same thing as with the other characters, nothing new.

Censor Augmentation

  • Placed a censor over the character's face in regards to the main outfit and then applied camera augmentation for that pose as well. Details for censor augmentation can be found here.

Image Closeups:

  • Fairly aggressive with niche view types such as Chest View, Torso View, and Lower Body View, Hat with Eyes

  • Close up of beret, boots, music note, and eyes

Repeats Strategy:

  • 1: Loopbacks, custom outfit images, style augmentation

  • 3: Camera Augmentation of the Main Costume

  • 15: Image Closeups

  • 20: Eye Closeups

I generally prefer using a single repeat for any type of loopback as generated images are error prone and it might be difficult to notice minor problems. From testing with loopback, custom Outfits surprisingly do not need high repeats, there only needs to be enough variation and that they should be captioned very well.

Full Body, Upper Body, and Cowboy shots do not need a high repeat count as I have found that while these shots do help with getting more overall detail, they don't help with getting the finer character details. The finer details will be need to be added as a close-up with a folder with a high repeat count. I found anything below 10 doesn't really work.

Eyes are set at a very high repeat count as most LoRAs tend to struggle with pupil detail. It could work with 10 but I don't recommend beyond 20 as I didn't notice any improvements. Eyes were placed in a different folder named pink_eyes but I found that using a separate class name wasn't necessary.

Captioning Strategy:

  • Minimal Pruning

  • Mostly with Autocaptioning

  • Remove erroneous tags

  • Prune common tags such as shirt,skirt for custom outfits

    • Ensure that custom outfits have color attached to the common tag e.g. white_shirt,yellow_shirt etc.

  • Eye color should not be pruned.

  • mainoutfit tag added to original costume

  • Add no_headwear to related outfits

  • alternativeoutfit tag aded to custom costumes

  • Add related camera angles to the prompt from_behind, from_side, cowboy shot,etc

  • trigger word added

  • Prune any strange series or character names

I personally prefer using a minimal pruning strategy with auto-captioning since I'm in the camp of LoRAs learn based on what they are captioned by rather than pruning to force certain concepts. The latter was more of the case with SD1.5 but I noticed that SDXL behaves rather differently. Captioning isn't a complete science as there are ton of interlocked variables but these are the behaviors that I have noticed with my LoRas.

Erroneous tags will be need to be pruned since the autocaptioner isn't always correct. It may caption eye color incorrectly or add an unrelated outfit tag. This can cause unintentional color bleeding between outfit tags.

Common tags without color attached should be pruned for custom outfits. This is to prevent color bleeding from custom outfits onto the main outfit. Pruning these tags for the main outfit tends to cause a color bias for generation.

Eye color should not be pruned since this will cause pupil detail to be lost.

Related outfit tags were added but I'm on the side that this is more of placebo since I found custom names aside from the trigger word seem to not help at all.

Camera Poses should be added to prevent any type of weird body position overfitting.

Trigger word is fairly standard.

Prune any strange names as we don't want any other characters to accidently bleed into the LoRA.

Training Settings

Settings are the same as the Elris LoRa with the prodigy optimizer

Initial LoRA

Test Strategy: With an intent to loopback, my general strategy is to see if the LoRA is at minimum able to capture the finer or basic details without too much distortion. Ideally, the basic shape should be visible which allows broken details to be fixed via inpainting. Posing at this stage is not a concern as more loopback is generally necessary for a more flexible LoRA. Additionally, I will need to aspect if any details collapse whenever costume outfits are used.

Impressions:

Quite interestingly, the top, leaf, note, eyelash, beret design were learned very easily with 15 repeats. There are some image quality issues with the initial LoRA since I didn't run an upscale with MultiDiffusion. However, I had zero issues with asymmetry for these aspects.

Very interestingly, the LoRA is able to generate each close-up of the body part individually but then struggles to put each part together from a further distance.

The LoRa appears to struggle greatly with getting the details of the lower skirt correct. It could not capture the asymmetric design and would often be on the wrong side or have an incorrect shape. In some scenarios, the skirt would not stay hidden under the dress and appeared over it.

Adding more repeats did not seem to help as it appeared to warp the body anatomy. I suspect that since lower_body is typically not the focus of most images, close-ups will not help the LoRA learn that particular detail and some form of loopback or inpainting will be required.

I was able to occasionally force the lower body details in by adding 'lower body' to the prompt but that also had the side effect of showing half of the body most of the time. I was able to adjust this problem with using loopback but it didn't help with the design accuracy.

In regards to custom outfits, I found that my character's hair tuft began to undertrain so this implies that for more complicated hairstyles, some form of outfit augmentation is needed. In particular, I noticed that outfits that used different headwear struggled the most, in particular, maid outfit.

I initially expected the eyes to be the hardest aspect of the training the LoRA as many LoRAs on civitai always seem to get complex pupils wrong. Ultimately, I discovered that this LoRA will still get the pupils incorrect but the correct pupils with come by using the adetailer extension with a high denoising strength 0.6 ~ 0.7 and a high inpaint resolution such as 1536x1536. I found that it can work at lower resolutions but mileage will vary. It does require the base LoRA strength to be high at 0.8 ~ 1 in order for the pupils to be inpainted correctly. The full eyes detection model can be found on civitai.

The main takeaway for pupil training is that you do not want to prune eye color in the captioning since it can cause the pupils to turn to a different object instead.

Next Iterations

Loopback Training

Loopback training is the process of using generated images from the LoRA and adding it back to the dataset in hopes of improving the LoRA quality. Loopback training is the last ditch effort if no other adjustments can be made to improve detail capture. It generally requires very careful attention to detail as the LoRA can pick up errors that you do not want. If you want an all-in-one LoRA, then some loopback will be needed as a single image is generally not enough for a character to be fully rotatable or positioned in unique angles. However, if all you want is character likeness and ability to swap outfits then training a different LoRA with the approach for Kie's LoRA is enough. (Main issue with Kie's LoRA approach is that the original design tends to be undertrained with complex details) Loopback training is only necessary if you want to be very pedantic with details and flexibility for a single LoRA. I recommend some loopback training just to help with posing other costumes as that is easier to work with.

Loopback Strategy

  • Different poses and Camera Angles for custom outfits (nudity,swimsuit,skimpy,etc)

  • Censor augmentation with custom outfits

  • Different poses and Camera Angles for Original Outfit (Done in the last iteration)

  • Different Colored Outfits with crossover with Original Outfit (Shirt, Dress, Skirt)

  • Limit Outfit Type to about 3 ~ 5 images

  • Use a weight of 0.8 or lower when creating loopbacks

  • Don't use artist names for loopback

Ideally, add the outfits that you want so that you have an easier time with generating them. Loopback follows general LoRA creation tips with diversifying pose, outfits and camera angle. One thing to keep in mind is that loopback will cause your character to slightly undertrain so you need to add more images of your original design. Additionally, I recommend using MultiDiffusion as an upscaling process to remove any antialiasing artifacts in the image.

I generally use a rule of 3 ~ 5 images per outfit type as I noticed that it usually takes around that many images to prevent bleeding from the original design.

If you don't want nudity, then any bikinis can work. Nudity in particular is just an outfit generates a ton of tags with an auto-captioner and has a lot of crossover with any outfit with reveals skin. Overall, nudity just reduces the overall amount of work when it comes to creating other revealing outfits. In the context of LoRA training, no single particular outfit will not reduce overfitting or help with general flexibility. This means that multiple outfits are a must when making a LoRA flexible.

When running loopback with outfits that are very similar to the original design, I recommend creating color variants that do not match the color of the original outfit. My character has a black dress so I create different dress outfits that are colored red, pink, etc. This will allow you to generate similar outfits without having the original design bleeding though. Do not add the same color outfit if they share a similar outfit tag as the LoRA will not be able to tell between outfits and will cause undertraining of your original design. An outfit with the same color will still be useable but it requires the LoRA to be used at a low weight at 0.4 ~ 0.6.

I recommend using a lower weight when doing loopback since you don't want the LoRA to overtrain on mistakes from a weight of 1. The lower you can go, the overall better the results will be along with reduced style bleeding. The downside is that there is a going to be a balance of how much fine detail to capture as lower weights generally mean that the LoRA won't capture all of the details and occasionally inpainting or editing fixes will be necessary. You can use censor augmentation if the face details are too different but hairline remains the same.

From my experience, AI tends to struggle with facial detail capture regarding with full body shots with at a far distance. It appears to still learn outline and colors which is something that you could use to your advantage. This means you could get away with minimal editing with faraway shots. It's fairly risky but works with more simpler hair designs. I recommend that the eye color should be the same shade as original image so that incorrect coloring isn't learned.

For characters with far more complex hair outlines or facial shapes, I recommend doing a two pass generation using the mistoline controlnet.

  1. First generate your character

  2. In an art editor, crop out the face and resize, or transform to your desired need.

  3. Afterwards, place the image into the mistoline controlnet and softedge_anyline in txt2img and then generate. You will need to experiment with the controlnet weights and end step for best results. Another advantage of this approach is that you can use low weights at 0.4 and 0.6. You may need to do some minimal inpainting. This approach can help with fixing the lighting around the character's face which I tested for a different but didn't bother for this character. (Due to laziness)

Regrettably, with an original design, loopbacking isn't easy and some inpainting will be needed but the goal of the initial step was able to get all of the basic detail in so inpainting for the original design wouldn't be as painful.

Don't overuse any pose as that will cause the LoRA to be overtrained which includes head position, and hand positioning.

I don't recommend using different artist names for the loopback generation as these images could unintentionally change your LoRA's color palette.

In my LoRA's loopback, I used censor augmentation to help minimize the amount of the inpainting that I would need to do. I generated images with different poses and outfits and then placed a censor over the face. At this point, I did not add any loopbacks of the original design.

Other Additions

  • As an issue with the first iteration, I added costume augmentations of school outfit, dress ,maid, bunny girl, wedding dress and kimono

  • As stated prior, I would recommend adding outfits that have a crossover with the original design along with some outfits regarding the hairline in order to have hair placement trained properly with different outfits

  • Camera augmentation was applied for these outfits since I wanted to capture the hairline more accurately

  • I experimented with eye repeats here as well, lowering from 20 to 15.

Impressions of further Iterations

  • Costume Augmentation helped with hairstyle consistency across multiple outfits but some undertraining persists

  • There is some concept bleeding between outfits but it is not as overly apparent compared to the first iteration. (Colors and basic shape from original outfit would always try to bleed over)

  • I was fairly lazy with double checking with the art design for the custom outfits so there is a notable style difference between custom outfits and the original design.

  • Very interesting, the loopback images also helped with the getting original design a bit more flexible with poses. (However, still very overtrained)

  • Minor details are learned very quickly, so there needs to be attention to quality regarding the character aspects. Cheating with censor augmentation to hide incorrect details regarding bad loopback appears to help (only for face censor).

  • Main outfit appears to be untouched for the most part but some color bleeding is occasionally showing through the shirt, dress, and skirt colors. Emphasis on black appears to help suppress the effect

Failure of using Applying different Background to same image

I tried another iteration where I added different backgrounds to the original design to see if that would help getting the lower body more correctly. My results were unfortunately very disappointing as I noticed that the difference wasn't very noticeable for a solid claim. My basis for this idea to work was that LoRAs very likely learn things in regards to landmarks regarding to how they were captioned so I speculated that I could use background keywords to help place the details correctly. Very interestingly, I also found that censoring the face here also caused problems with inpainting the eyes properly.

Last Attempt with Loopback

The proper loopback attempt with using the original design. The goal here was to use posing keywords and then add those images to the dataset. Camera Augmentation is needed here so that the eye details are not lost. I mainly had cowboy shots, upper body, portrait and lower body shots. The LoRA is more overfit compared to the prior one so more emphasis is needed to get the LoRA to pose correctly.

My initial impressions were that it did have better costume fidelity compared to the background image but it also caused the eye pupils to become underfitted towards the lower half. I wonder if I need to just increase the number of repeats for the eyes back to 20. However, I am hitting diminishing returns so I am stopping here.

Oddly enough, Kie did not struggle with capturing the skirt detail despite having lower repeats. The main difference between Kie and Prisca is that the skirt has more landmarks captions such as thighs ,thighhighs,pleated_skirt. My current speculation is that with more captions involving specific body parts , the easier time that Animagine will have with learning lower body details. Long skirt, and long dress unfortunately do not seem to have the same impact.

My work mostly involves inpainting so as a hybrid workflow this particular LoRA has already reached the standard of a SD1.5 LoRA but not quite at the level of a SDXL LoRA for txt2img purposes.

Miscellaneous

  • Better to use a style lora than to overly focus on reducing style bleeding

  • Katagari's Inpaint controlnet can be used to get PonyXL costume LoRAs to work together. Unlike SD1.5, this will require a low ending step and experiments with IP adaptor since reference isn't compatible with other CNs in SDXL.

  • When used in conjunction with other clothing LoRAs, I noticed that the hair tuft begins to undertrain.

  • Image quality issues can be fixed by upscaling with MultiDiffusion using a SD1.5 Checkpoint, but eye details need to be fixed

  • (Hybrid Workflows) Recommended to use the original image together with the LoRA as a reference to photobash finer details together and then inpaint to blend things together.

Summary of Things Learned

  • Don't prune eye color

  • Loopback is incredibly sensitive

  • Save more time with using a style LoRA over using too much style augmentation

  • Asymmetric details can be enforced with high enough repeats

  • Outfit Augmentation with Camera Augmentation helps with learning hairstyle

  • Censor Augmentation with Outfit Augmentation can help with possibility

  • LoRA appears to undertrain when paired against Style and Clothing LoRAs

  • When prompting, consider body or clothing aspects as landmarks for posing (sleeves, top, thighs, neck, ass, etc)

Summary of Things that Failed

  • Pruning characteristics

  • High repeats will not solve everything

  • Different class names for outfit type

Future Plans

  • Character with overly fancy eyelid design.

  • Character with overly fancy dress/skirt design

  • Character Turnaround for higher overall complexity

End Note

If you made it this far, then thanks for reading! This character ended on a bit of a failure but I learned a lot in the process. As always, the dataset is available on the model page if you want to look at the dataset instead of reading. There is some NSFW in dataset as part of the loopback.

References and Resources

9

Comments