Sign In

Baking Outfit LoRAs with Only a Few Screenshots: My Adventures in Creating VG Armor Models

1

Baking Outfit LoRAs with Only a Few Screenshots: My Adventures in Creating VG Armor Models

INTRODUCTION

Welcome to my first article! This is me writing down what I've learned from creating Final Fantasy XI armor loras using only a few crappy screenshots and maybe a fanart or two.

I'm still learning, so if this interests you, check back in for updates every once in a while, and please provide corrections if you see any mistakes.

Much of my learning started with this article, so give it a read for a fuller understanding of making loras on civitai: https://civitai.com/articles/9005/a-detailed-beginners-guide-to-lora-training-on-civitais-trainer. Great job, HidFig!

Just to clarify, if you have a decent amount of screenshots, creating synthetic data is not necessary; just make sure to tag the screenshots as '3d' and maybe 'screenshot' and the checkpoint will understand on its own how to render non-3d styles from it. Thank you to veteran baker NanashiAnon for mentioning this.


CREATING TRAINING IMAGES

I'm trying to create the Dragoon's first AF armour from the game Final Fantasy XI, an older game with dated graphics. An example of what I'm working with here (and this is AFTER upscaling!):

latest.png

Okay, so we don't have many training images, and the ones we have aren't very good. Not to worry -- we can use what we have to create more.


FIRST, before we even create any synthetic data, we need to make sure our raw data is large enough for the model to "see" everything properly, and render an accurate redraw. Upscale any small images first. If you don't have your own upscaling workflow, you can just use waifu-2x.

This is also the time to do any touch-ups if you know your way around Gimp/Photoshop/etc. . Put in the effort to remove as much visual interference as possible from the images. Crop aggressively. Smooth out any major pixelation or jaggedness, if you're able.


Nano Banana (NB) is your friend. Of all the free, easy-use image-gen services I've discovered, it is the most useful for doing accurate redraws. Make sure to get your prompt right though, as the free version currently only has two gens per day with Pro, and regular Nano Banana isn't nearly as good There's more uses nowadays and vanilla NB is better, but it's still good practice to get the prompt right the first time. I submitted a clear-as-possible screenshot of the armor with the following prompt:

I have attached a screenshot of a character. The screenshot is grainy and pixelated. Please take careful note of the armor design. You will render the outfit in more detail, matching the outfit as closely as possible, concept art style, rich colors, plain white background. Please show the armor from multiple views/angles, landscape orientation, high resolution as possible.

The result looks good:

Gemini_Generated_Image_xdups6xdups6xdup.png

Once we've run out of NBP uses, we can also use Sora to work with the images NBP gave us. R.I.P. Sora. As soon as I find another good, secondary, free generator to use I'll add it here. Thankfully Nano Banana has more uses these days.

Try to make the results as diverse as possible for healthy training data. Some examples of things to do:

  • Render images without the helmet (during baking, make sure to tag them as 'no headgear')

  • Render male and female body types (if the M/F outfits are different in some way, make sure to have a tag that differentiates them; see below)

  • Render different poses

  • For pictures with multiple views, crop each view out into its own image

  • White backgrounds are better than busy backgrounds; the less visual confusion the better

  • And of course if there is good fanart of the outfit, then use that too, but only if it's accurate -- we're going for fidelity here.

An example of my Sora prompt, using the armour art that Nano Banana gave me:

Attached is a picture of armor. Please render a man with a clean-cropped brown beard-mustache like Riker from Star Trek in the armor, striking different random poses. He is not wearing the helmet. White background.

The result. Lookin' good, Riker!

20260130_1001_Image Generation_remix_01kg7xjebrfj4r2t3z4dyk9pjs.png

Using these methods, I went from a few crappy screenshots to ~25 decent images. This isn't a high number by any means, but it's enough to train on.


PRE-BAKING

Tagging is supremely important for a number of reasons.

Make sure that anything that's 3d, pixelated, screenshot, etc. are tagged as such so the model differentiates and doesn't make everything that way.

Tags are also important for making the outfit more 'modular', e.g. adding/removing/changing components of the outfit. Tag everything. Use the auto-tagger for the obvious stuff, but go through each one and tag thoroughly. Set the allowable tags to maximum (30). No idea why it defaults to only 10.

  • As an example, the AI sometimes interpreted the winged helmet as horns; use the tag viewer to look for any spurious tags like 'horns' and delete them.

  • I didn't bother using male/female tags as the only real difference is the thigh cutout, so I just used that tag. If it's not necessary to make up your own tag to differentiate the outfit types, then don't do it. This way, for example, users can give a male character the thigh cutout if they want, or a female character the full pants.

Finally, make useful prompts for the epochs' sample images, to help you decide which epoch to use. Civ always recommends the last epoch, but it won't necessarily be your preferred one, especially if you see signs it's getting over-baked.


BAKING

Normally I use 12-15 epochs and maybe 4-8 repeats, however for so few training images this needs adjusting. We'll try 10 repeats this time; effectively, the robot needs to focus more on the few images we do have. If I had less than 20 training images, I would bump the repeats to maybe 12-15.

The LLM recommended I try to get the steps above 1,000. With 10 repeats, that means we should bring up the epoch count as well. 18 epochs and 10 repeats results in 1,260 steps. Good enough.

For DIM and Alpha:

  • DIM 16 β†’ safer, more general

  • DIM 24 β†’ more detail, slightly more risk of overfitting

  • DIM 8 β†’ might underfit with low‑res data

So we'll go with 16. This will make the file size a little higher than I usually have it, but that's one of the downsides of minimal training data!


SUMMARY

All of this now gives us the following (anything I don't mention, leave as-is):

  • Epochs β†’ 18

  • Repeats β†’ 10

  • Batch size β†’ 4

  • Resolution β†’ 1024

  • Enable Bucket β†’ Sure, why not

  • Shuffle tags β†’ Sure, why not

  • Flip Augmentation β†’ Yes, but only because this armor is symmetrical. If an outfit has any asymmetries you want to preserve, do not use this. It effectively flips images randomly to create more training data, but this will cancel out any asymmetries.

  • Keep tokens β†’ 1. Keeps the main trigger tag for all trained images to help lock in that association.

  • Clip Skip β†’ 2

  • Network DIM β†’ 16

  • Network Alpha β†’ 16

  • Noise Offset β†’ 0.05

  • Optimizer β†’ Prodigy


Anyway, I think that's the meat & potatoes of what you need. Check back as I'll be garnishing the article with more as I learn. Please look forward to it! Thanks for reading.

1