Sign In

A Guide To Ruin Your Day: How to Caption a Character LoRA

23
A Guide To Ruin Your Day: How to Caption a Character LoRA

Why?

I see so much misinformation and confusion surrounding this topic and I want to help people understand why what they're doing isn't working and how to improve their captioning.

No, why is it going to ruin my day?

Wellllll....You see all those captioning tools are good for approximately one thing: making text files that have your trigger word in it. Beyond that, they're ....not very helpful. OK...A LITTLE helpful. Captioning is extremely tedious and there's no real good way to get around it. It's work.

Stop Being Coy - How To Caption

Character LoRAs are by and large fairly simple concepts to understand once you understand your trigger word is a horrible misnomer. Your trigger word isn't some mystical word you're passing into the AI that makes it then go tap on your LoRA and make it run.

This is your trigger word:

It is your character in a neutral pose, with a neutral expression, on a transparent background (use your imagination I'm too lazy to go to photoshop for this). You are telling the AI that this is exactly what your character looks like when you type "trigger word". In this case she has clothes on but ideally, imagine your character is nude as well.

So what?

So captioning then: if you put 1girl, solo, Aik-J4cky, black hair, yellow eyes.

What are you saying to the AI? You're saying Aik-J4cky, is the figure above, but then black hair and yellow eyes are optional components of that base figure.

If you're specifying them they're not intrinsic to the base figure.

We use this to our advantage for clothes - in the above image I would caption her sneakers, watch, shirt, and hairtie. She always wears glasses so I would not caption her glasses.

What to caption

  • Poses

  • Expressions

  • Clothes

  • Backgrounds (even simple background or white background, etc.)

  • Anything you want to change about the character later (I captioned fangs for my Mykhaila LoRA as I wanted the fangs to be optional).

  • Teeth / open mouth / tongue - these are not normal situations so tag them if they're present

What not to caption

  • Unique identifiers!

  • Hair color

  • Hair style

  • Eye color

  • Skin color

  • Height

  • Weight

Weird Stuff

  • 1girl, solo

I saw a guide which says that they always use it so I always use it. Does it make a difference? I'm not sure. I figure, if I put 1girl and solo it may help in allowing her to appear with other characters down the line. The reasoning being that you may want 2girls in your prompt with her or 3girls. Same with solo, you may NOT want solo in your prompt.

Example

To caption this we would start with:

Aik-J4cky, Now recall this is saying Jacky that is neutral pose and neutral expression, nude on a blank background so we'll have to add in everything that differs from that. First let's add the standbys.

1girl, solo, from here I'd add in the background (honestly the order of tags doesn't matter you can even shuffle tags during training to help in learning - but I have a way I do it and I just stick to it)

outdoors, canyon, next up clothes

torn pants, denim pants, jeans, jacket, white shirt, blue blazer, then we can move to pose

sitting, yeah that's it here, I don't want to use something like "holding jacket" as that tag has a connotation of holding a jacket that is not being worn. Gross movements - leg up, arms raised -good. Small movements (holding anything) - bad.

Annoyed, I used to be really lax with tagging the expressions until Sanne finally wore me down. She wears a permanent scowl as a default and that's because it's mixing every expression. I didn't tag the expressions I created so it believes that every expression is intrinsic and thus my training dataset was just poor. It tries to average the expressions in the dataset and thus - perma scowl. For the story I was writing it worked in my favor since she was a sad, angry little elf but it also taught me a valuable lesson - tag your expressions.

Autocaptioners

So why do autocaptioners not work?

Autocaptioning will caption everything in the image. It will caption everything in the "what to caption" and "what not to caption" box. This will make your LoRA kind of garbage. It will make your black hair be brown or blonde or red. It will make your images very inconsistent. You can use them and then edit the captions to remove the identifying information. But you can almost never use them, not edit the captions, and hope that you will get a good result.

For example, here's the output of auto-captioning on the above image:

1girl, dark-skinned female, dark skin, black hair, torn clothes, solo, torn pants, long hair, glasses, pants, denim, yellow eyes, jacket, looking at viewer, sitting, jeans, shirt, mole, wavy hair, breasts, mole under eye, outdoors, white shirt, lips, torn jeans

You can certainly use it and delete all the red text and it would be pretty good - just be sure to add your expression tags.

Final Thoughts

Hopefully this was helpful to some people. My limited testing with Flux has shown that tag based captions work fine with Flux as well (Jacky again works just fine having been trained with tag based captions on accident) although the preferred method would be to use purple prose crap.

In those scenarios the concept is the same you're just writing long flowery sentences instead of tags. ie. Aik-J4cky, sits outdoors on a rocky canyon outcropping. She is wearing a blah blah blah. Her expression is x. It doesn't have to be AS flowery as if you were prompting.

23

Comments