Is It Dumb to Manually Tag Your Dataset?

Yes, This Is Mostly a Rant About Autotaggers

TL;DR

This is more of a rant than an informative read.

I prefer tagging my dataset manually because autotagger results usually suck for the things I actually care about. They are not always completely useless, and they are not always wrong, but for my workflow, they often give me more cleanup work than actual help.

I still debate this with myself sometimes. Do I really need an autotagger to fill the holes in my dataset, or should I just keep tagging everything manually like a stubborn idiot?

Honestly, I keep coming back to the same answer: why would I let a tool make a messy caption first, then spend my time fixing it, when I can just make the caption properly from the beginning?

Small Clarification Before Someone Calls Me Out

When I say I tag manually, I do not mean I sit in a raw text editor and type every single tag with zero helper tools. That would make me suffer too.

I still use BatchBench, my own tagging tool, because it makes the manual part less miserable. It helps me review images in my own flow, keep useful tag references nearby, resume where I left off, and clean up or apply tags without dealing with the annoying parts of the on-site editor.

But the important part is this: BatchBench does not decide whether an image should be tagged as upper_body, whether a hairstyle is visible enough, whether an outfit tag is useful, or whether a background detail matters for controllability later.

Those decisions are still mine.

So “fully manual” is probably not the right phrase. My workflow is more like manual-first tagging, or human-guided tagging with quality-of-life tools.

My problem is not tools. I like tools. My problem is letting an autotagger write the caption first, then needing to repair its homework.

How I Ended Up Becoming This Annoyed

When I first started making LoRAs, I did not know what the hell I was doing. You see this? That LoRA was made almost completely blind. I used an autotagger, copied other people's on-site training settings, and just hoped things would work somehow.

That was basically my entire method:

“Other people seem to do this, so surely I can also do this.”

Very scientific. Very professional.

Later, I found a colleague who told me that I should remove tags related to the character's appearance and outfit details so those things could be “absorbed” by the trigger prompt. So I did that. I started looking through my dataset and removing some tags because I wanted to follow that idea: hair details, outfit details, accessories, and all that stuff. The trigger prompt was supposed to carry the character identity by itself.

After a couple of trainings using that method, though, I found myself dissatisfied because I could not swap the character's outfit properly anymore. The outfit kept sticking to the character because, surprise surprise, I basically told the model that the outfit was part of the character's default identity.

That was expected. The outfit got baked into the trigger prompt.

You can see that with this. That LoRA originally used the “absorb everything into the trigger” method. But because the police outfit kept appearing, I tried tagging the outfit separately using simple clothing tags instead.

And it worked.

After that, I slowly started adding more identity-related tags, such as general outfit details and character traits. A lot of this was also influenced by LittleJelly's article about LoRA training. He explained a bunch of interesting stuff about tagging, and it made me think more seriously about what I was actually teaching the model.

I also had plenty of discussions with him in the comments of that article, and with people in the CivitAI Discord. One thing that kept coming up was that I also needed to tag the things surrounding the character—not just the hair, clothes, and face. Backgrounds, lighting, body framing, camera angle, pose, other people in the image, all that stuff can become part of the character if I keep leaving it untagged.

That was the point where it finally clicked in my head. If I want less random stuff to bake into the character, I need to separate more things in the caption.

At first, I thought it sounded annoying. I did not want to tag everything either. But the more I trained, the more I started getting that weird, uncomfortable feeling whenever I relied on an autotagger, because I slowly realized something: I was not actually avoiding the manual decision-making part.

I was still the one deciding what belonged in the caption. I was just doing it in a more annoying format.

First, I read a machine-generated mess. Then I delete half of it, fix the wrong parts, add the missing tags anyway, and sit there wondering why I even bothered using the autotagger in the first place.

That is the part that slowly pissed me off.

The tool is supposed to be a shortcut, but somehow I still need to become its supervisor, quality-control staff, cleanup crew, and final decision maker. At that point, bro, what exactly did you help me with?

What I Usually Need to Tag

For me, an image usually consists of these parts:

Subject count: solo, 1other, 2others, multiple_others
Breast size, only when it is visible enough to judge
Character appearance: predefined tags such as hairstyle, hair color, eye color, accessories, and other identity traits
Character outfit: predefined clothing tags and outfit details
Expression, pose, arm gestures, and leg gestures
Body framing: full_body, cowboy_shot, upper_body, and so on
POV and camera angle: from_side, from_behind, from_above, and similar tags
Background details: usually broad context such as outdoors, bedroom, classroom, or forest

These are not the only things that can matter in a dataset. This is just the structure I personally care about when I tag images.

Now look at that list. Out of those eight points, only number 5 and number 8 can really benefit from autotagging for me. Everything else still needs me to inspect the image and decide what is correct.

That is why I get annoyed when people act like autotagging magically removes the hard part. The hard part is not typing tags. The hard part is deciding whether the tag actually belongs there.

Why I Still Need to Check Almost Everything

1. Subject Count

I do not use 1girl, 1boy, and similar tags in my dataset. I usually use solo, 1other, 2others, or multiple_others when needed.

So when the autotagger gives me 1girl, 1boy, and similar tags, I still need to remove them. It is not a huge amount of work by itself, but it keeps happening on every image. When you keep deleting the same useless tags over and over, it gets annoying real fast.

2. Breast Size

Sometimes the system cannot properly differentiate between small and medium breasts. That means I still need to inspect the image and correct it myself.

And since I need to look at the image anyway, what exactly did the autotagger save me from? Typing two words? The actual decision still comes from me.

3. Character Appearance

This one is very likely to be wrong, incomplete, or inconsistent. I use predefined appearance tags because I already know which tags I want for that character. I check the Danbooru wiki, look at examples, and make sure the tag matches the visual detail and can be used consistently across the dataset.

So when the autotagger gives me something vague, slightly wrong, or different from the structure I already prepared, I still need to fix it. And sometimes it is not even slightly wrong. Sometimes it is just straight-up nonsense.

Like, bro. We are looking at the same image. What are you seeing?

4. Character Outfit

Same issue, but now with clothes. Autotaggers can give overlapping or vague tags. For example, instead of giving just t-shirt, it might also include shirt.

Then I still need to decide whether both tags are useful, whether one is redundant, whether one is too broad, or whether I should replace them with the predefined tags I already prepared.

That is where I start losing patience. I already know what the character is wearing. I already checked the proper Danbooru tags. So now I need to spend extra time correcting a tool that gave me a worse version of the tag list I already made?

Amazing. Very helpful.

5. Expression, Pose, and Gestures

Okay, this one can actually benefit from autotagging. Expressions, poses, hand positions, leg positions, and similar details can be annoying to think about from scratch every time.

The autotagger can give me a rough starting point here. I still check it, obviously, but at least this is one of the areas where it can sometimes save some thinking time.

Not always. But sometimes.

6. Body Framing

Autotaggers are very unreliable here. Sometimes they give the wrong tag, and sometimes they do not give any tag at all.

I do not trust an autotagger to consistently understand the difference between upper_body and cowboy_shot. That difference matters to me because body framing affects how the LoRA understands proportions, apparent height, body scale, and how much of the character is visible in a shot.

If the image shows the waist and upper thighs, I do not want the caption pretending it is just upper_body. If the image cuts around the knees, I do not want the tool randomly calling it full_body.

So yes, I still need to look at it myself. Again.

7. POV and Camera Angle

Same problem. Autotaggers often miss camera angle tags completely, or they give something that does not match how I see the image.

I trust my own eyes more than a machine when deciding whether an image is from_side, from_behind, from_above, from_below, or something else. Perspective can be subtle. A shot can be both from_side and from_above. A shot can look mostly frontal but still have a strong enough angle to matter.

A wrong camera tag can teach the model bad perspective logic, so I would rather skip it than confidently put the wrong thing there. And no, I do not trust the autotagger enough to make that decision for me.

8. Background Details

This is the other category where autotaggers can actually help. For broad context such as outdoors, bedroom, forest, classroom, or street, the result can sometimes be useful as a starting point.

I still need to check it, but at least this is one of the few areas where the tool can give me something usable without immediately making me sigh.

So yeah. Number 5 and number 8. Out of eight major things that I personally care about.

The Actual Problem

You see? Out of the eight categories I usually need to handle, only two can really benefit from autotagging. For the parts I personally care about, I still need to manually review and correct most of the caption structure.

I need to look at every tag and ask myself:

Is this tag actually correct for this specific image?
Is this tag useful for my dataset?
Is this tag redundant?
Is this tag missing?
Should I replace this with the predefined tag I already use?
Is there another important tag that the autotagger completely failed to include?
Why the hell did it call this upper_body?

Then I need to remove wrong tags, replace bad tags, fix inconsistent tags, and add the missing things.

YOU ARE DOING MULTIPLE JOBS INSTEAD OF ONE.

First, you let the autotagger make a messy draft. Then you manually inspect and rebuild that draft until it becomes usable.

That is not me avoiding the manual part of tagging. That is me still making the tagging decisions while also babysitting a machine.

Might as well just start from my own caption decisions in the first place. At least then I only do one job from beginning to end.

“But It Helps You Discover Tags”

Another thing I hear sometimes is that autotagging helps people discover tags that should exist.

Yeah, no shit. If you give an image to a tool whose entire job is to throw descriptive tags at it, of course it can sometimes point out a tag that describes something in the image.

But it can also give you the wrong tag, a redundant tag, a vague tag, or a tag that technically describes something but is useless for your dataset.

For me, if I look at an image and notice a detail that I do not know how to tag yet, I do not just wait for an autotagger to tell me what it is. I inspect the image properly, think about what details actually matter, then search the Danbooru wiki and look at examples until I find the tag that fits.

That is how I find tags that should be there.

The autotagger can give suggestions, sure. But a suggestion is not the same thing as understanding what should be included in the dataset. It does not know which details I want to control later, which tags are redundant, which tags can cause concept leakage, or which tag fits my predefined structure better.

Maybe I am just too controlling about my dataset. But I would rather train my own eyes to notice the important stuff than rely on a tool to point at my own image and go, “Hey bro, there is a shirt here.”

“But Autotagging Saves Time”

People are like:

“Oh, but autotaggers save time. They make tagging much easier.”

Saving time my ass.

Maybe it saves time when your goal is:

“Good enough caption, throw it into training, pray.”

And honestly, that is fine. Not everyone wants to inspect every tag. Not everyone cares about getting the exact body framing, camera angle, outfit structure, and background context right.

But my goal is not just to have captions. I want captions that are intentional. I want tags that are consistent. I want the model to understand what belongs to the character and what is just part of the image.

I want better control when I try to swap outfits later. I want to know why my LoRA learned something instead of just staring at a weird generation and going:

“Hmm. Why is that random thing baked in?”

So no, I do not feel saved when the autotagger gives me twenty tags and I have to sit there like:

“Wrong.”
“Redundant.”
“Where is the tag I actually need?”
“Why did you call that upper_body?”

That is not automation. That is unpaid tag moderation.

People sometimes talk about autotagging as if the hard part of tagging is physically typing something like long_hair or cowboy_shot. No. Typing is the easy part.

The hard part is looking at an image and deciding whether the hair is actually visible enough to tag as long_hair, whether that framing is really cowboy_shot, whether the outfit tag is too vague, whether the camera angle is clear enough to tag, whether a detail matters for controllability later, and whether a background object is important or just random clutter.

The autotagger does not remove that decision-making process. It just gives me another thing to audit before I make the decision myself.

Okay, the Rant Ends Here

That rant is genuinely how I feel when people talk about autotaggers as if they automatically save time for everyone.

Maybe they do save time for some workflows. Some people have huge datasets, do not need highly detailed captions, or are fine with captions that are only “good enough” for their training goal.

That is completely valid.

For me, though, autotagging often creates a different kind of work. I still need to review the tags, remove the wrong ones, fix inconsistent ones, and add the important things that were missing.

So it is not that I think autotaggers are useless, or that I think people should avoid tools completely. I already use tools myself. The difference is that I want my tools to make the workflow easier, not make the tagging decisions for me and leave me to repair them afterward.

I prefer having control from the beginning. I would rather inspect the image, decide what I want the model to learn, and build the caption through my own tagging flow instead of correcting an autogenerated caption afterward.

At the end of the day, people should use the workflow that actually fits them.

Mine just happens to be manual-first tagging with tools that make the manual part less miserable.