Sign In

NanashiAnon's Cool LoRA Training Guide for Attractive People

NanashiAnon's Cool LoRA Training Guide for Attractive People

Intro

(Disclaimer: I only have any real LoRA experience with PonyXL. Most of this guide is still largely applicable to SD1.5, SDXL, and Illustrious from what I understand of them. There’s some overlap with Flux/SD3.5, but natural language tagging and their data set standards will differ a good deal.)

Explanations on LoRA making are often over complicated and even many experienced image makers seem to think its a pretty tricky ordeal. It’s actually a lot easier than it sounds, with some parts more mildly tedious and taking a bit of time than actually difficult. Making a trained LoRA largely comes down to only four steps and (if using the on-site trainer) 500+ Buzz (payable with blue).

  1. Gather data set

  2. Clean data set (not always needed)

  3. Tag data set

  4. Set settings and train data set

The amount of time you need to make a LoRA is primarily going to depend on how much time step 2 takes for you, which can often be as low as "nothing", and how you acquire the training data for step 1, with step 3 taking an amount of time that will depend in large part on your interface speed (how fast you can type, change tabs, etc.). Overall I estimate that if I really wanted to (and I don't), with my experience level, I could probably cram out a LoRA for a character I knew, could easily find data for, and wouldn't have trouble picking a set of tags for, in about 30 minutes, plus the roughly 1 hour (and possibly another hour in queue if site is busy) it takes to train after submitting everything (an entirely hands off processing process).

This guide will be a bit long, but it's mainly so I can cover a lot of things that are only applicable to a limited number of LoRA types.

1: Gather Data Set

What's a data set?

In simplest terms, this is just getting a lot of pictures of your subject matter. The exact details of this are going to vary by what you’re making a LoRA of. For the rest of the guide I'm going to assume characters because that's what most people start with and the one I have the most experience with. In general for characters you want the character in a variety of pictures in poses, orientations (front, back, side) and art styles (Unless you want your results to favor a particular style. Many do, primarily when it comes to animated characters or a particular artist.) while showing the character's full body (hair and hat to footwear). You want to deprioritize images with other characters and (especially) text, but it's often possible (sometimes even easy) to get rid of both issues with cleaning (see below) and for at least PDXL multiple characters in some training data is workable (I've heard 1.5 really suffers if you use such, but can't confirm). Having a perfect set of training data is rare, especially with obscure characters, but perfect isn’t needed.

Actually getting your pictures is going to vary by the subject. Whenever possible, you want to start with “official art”, as it is almost always correct in details (by definition, though I've encountered some odd inconsistencies within official art before) and well drawn as well as frequently full body on a simple background. You can typically find these on boorus (using the tag “official_art” with the media name), fan wikis and (through Wayback if needed) official websites, or just a image search for the character. For games box art (GameFAQs consistently has them in good quality, front and back) and manual scans (finding these is hit or miss) often have what you’re looking for. Don’t be afraid if the art isn’t in color: SD (including SDXL and PDXL) doesn’t actually care much about colors and learns more off shapes (there’s articles about this and I’m not really qualified to explain in detail). I have actually trained a model on entirely monochrome images (and another where the vast majority are monochrome) and it adopts colors you specify perfectly (and even does a good job of guessing without).

Animation

For animated characters with lots of screen time, getting a data set is as simple as getting a bunch of screenshots. This type is the most covered in other guides so I’m not really emphasizing instructions on it here. The one point I don’t see mentioned on those which I’ll point out is that anime (followed by games with anime cutscenes, then western animation in distant third) is most likely medium to have settei (設定) in existence. These model sheets of production lineart are highly desirable since they’re going to be correct by definition, full body, and literally made to show how to draw a character. There’s at least one English website collecting these, and searching in Japanese can often find more.

Video Games

For game characters it’s going to vary by what kind of game they’re from (and how much they’re in it). For older games (PS1/DS and earlier) you’ll rarely get more than one or two pics (generally from the opening, ending, or character selection) from the game itself (rest will simply be too small), and you can often find higher quality versions of these few on the internet, unless its a game full of cutscene stills or (for PS1/Saturn) has animated cutscenes, and you’ll have to largely rely on official art and fanart. Newer games are likely to have higher quality sprites or decent quality 3D models for their characters, which presents its own options. If you can rip the model, it's decent quality (roughly PS2 era onward), have some way of obtaining free camera control or just good native camera controls (and ideally a way to turn the HUD/crosshair off, more games have it as an option than you’d expect), you can often get 9 to 13 pics from a good model by making use of camera orientation tags (“from side”, “from behind” etc.) in the right combination (I’ve got an article for it mostly written out and will publish it once I've tested 3D model based LoRAs a bit more, but it’s too big a space hog to include here).

Comics and Manga

I’ve already got a dedicated article covering using manga and comics to build data sets. PDXL is more tolerate of weird crops than you’d expect, though I'd favor a larger data set if the nature of the character's original media means I have to rely on these (also for many characters who only appeared in a single or few issues I like knowing I put literally every image of a character into training data).

Fan Art

Using fan art for data set is applicable to most media. You can find this on sites like boorus, Pixiv (knowing a work and character’s Japanese name is very, very helpful here. Note that a lot of art will only be tagged with the series, not character, name. Especially when the artist was drawing multiple characters from the work as Pixiv only supports a low number of tags per image.) and more. You can occasionally find stuff via normal internet image search as well (especially for relatively obscure Japanese works when checking in native language. A lot of bloggers will draw art of scenes they found funny and you’ll sometimes find social media posts with fanart). The problem here is the total inconsistency in quality control: You’ll find quick sketches and professional level pieces aplenty. More importantly, a lot of fan artists will miss details or get certain aspects wrong (I've got a data set ready for cleaning and already rejected 6 of 29 fan art I collected when I realized they drew her with a skirt when she actually has loose shorts. Some of these may get cropped to hips up.). You need to curate what you take from here, but it’s often the vast majority of what you’ve got to work with.

How much training data do you need?

One of the big questions with LoRA making is how many pictures you’ll need. I’m not aware of any real tests on this (please comment if you have one), but to my knowledge the common recommended minimum (that I personally haven't proven wrong) for characters with a single outfit for Pony (SD 1.5 will be more) is generally 20, and past 30-35 has diminishing returns that make it preferable to consider culling your worst images instead of adding more. The tricky thing is that your quality of used images is often more important than sheer quantity: I've seen a LoRA made with just 8 high quality pictures on clear backgrounds that worked very well. I think complexity also plays a part in how many images you’ll need (said LoRA is of a character with relatively simple design), and how many parts you can assign a proper tag to (with a complex design that defies tagging needing the most data). The numbers will also be higher if you want more than a single version of a character, such as a second outfit, though such extras don't double your data needs (my estimates for the pics needed for alternate costumes is "at least 6", though again the complexity of the alternate outfit likely plays a part). I make a point of publicly listing the number and general type of training data pictures for all my LoRAs in the description precisely because this question lacks a clear answer and making this information public can demystify it.

One final note: Avoid “synthetic” (AI generated) training data. Feeding AI to AI produces a very distinct form of broken model. I’ve only ever seen poses/concepts come out well, and those were all by someone who very much knew what they were doing.

2: Clean Data Set

This step is easiest to explain, but the hardest to actually do. At its simplest, cleaning a data set is just using basic image manipulation techniques and tools to remove unwanted elements from training data. This includes things like cropping a crowded scene to show only the character you are trying to replicate, erasing text, or cropping a single picture with multiple different views of the character to their own file. Depending on what your data set includes, you may not actually have to do this or do very little (some cropping, delete text off a white background). I prefer to save all the modified versions (and all pictures that need no manipulation) in a folder labeled “Final” within the folder I’m working on and move the originals to a folder called "Obsolete" so I know I've already worked on them.

One small tip here is that you don’t actually need to have a head in all your training data, and if you need to you can crop out the character’s head in pictures if you need to. Using tags like “head out of frame” and "lower body" is possible in tagging. I've done this when I determined that the character's head was poorly drawn (but the body fine), or the upper part of the image is too crowded with other characters.

3: Tag Data Set

Once you’ve got your data set gathered and cleaned, you need to tag it. During training the LoRA learns to recognize the tags you've used repeatedly with the shapes in the associated data, so a “pleated skirt” will become the skirt of the character you trained it on and so on. You can teach both unique words (such as your character's name) and refine understanding of existing words (such as the clothing your character wears or their hairstyle). This section is a bit long, but it's more because there's a lot of caveats to cover, even if the process itself is relatively simple.

Getting ready to tag

After (or while you) cleaned your data set, come up with a general idea of how you’re going to tag your character. Essentially you’re going to be tagging every picture as though it were a very well tagged booru listing, or the prompt you’d use to generate such an image (minus Pony score/control tags). You want to start with a trigger word that is essentially the character’s name for the purposes of the LoRA. You want this to be unique to avoid colliding with what the base model already knows about the character (even if limited) or similarly named characters, not be as readily parsed if it’s multiple words (avoid spaces), and (for more normal sounding names) CivitAI’s onsite generator can mistake it for some real person you’ve never heard of and block generations with it. Don’t over complicate this: The character’s name, an underscore, and work’s initials or abbreviation is sufficient (Penny_IG, Miku_Voc) and it helps if people can type it easily. After that you need to figure out a set of tags that cover the character’s physical characteristics, and their outfit. This is going to be more complicated the more complicated the character and their attire is. Having a lot of valid tags is actually preferable and helps with quick learning (though there are certainly characters who have limited tag options). My experience is that very complex outfits with a tag for every piece are picked up very quickly, but outfits of only moderate complex with limited tags (such as a very particular jacket that can only be described as "jacket") take much longer (and thus less understood by the final LoRA).

If you’re not a master of booru tagging, you can see what the auto-labeler offers you (see step after next) and if your character has existing booru entires, you can start by checking them for ideas of what tags to use. Beware both these are often overly basic and sometimes even wrong (mistaking a skirt+jacket for a dress and the like). Note that SD (etc.) has some inherent properties to common clothing types. For example, a "jacket" can open, and typically opens down the center and generations can actually figure that out even if you have no pics with the jacket open in training. If you have a general idea of a tag but don’t know if there’s actually one for it, you can search the list of tags on a booru for a queries with wildcards (“*shirt*” will find all tags with “shirt” in their name, “*shirt” will find all tags that end in exactly “shirt” but can have anything in front of them) and sort by most used. Anything with 500+ uses tends to be usable (Maybe only 400+ if SD can connect it to non-tag uses in data like “spacecraft interior”). You can also ask for help if you're really stuck.

One thing to leave out is eye color. As mentioned earlier, color is easily inserted at prompt time, and even official art is often very sloppy about keeping eye color consistent: Shades of “green eyes” are all over the place (even without getting into the complicated stuff about language and the perception of color), and some artstyles styles don’t really have a clear eye color to begin with. For some more ambiguous tags (“collar”) or rarer tags, be aware of what the tags you use will generate (easiest way is to just gen a picture with that tag and minimal other stuff) before using them. If you’re training multiple outfits, try to use dissimilar tags for parts that are different (instead of shared accessories) when possible (such as one is a “bow” and the other a “ribbon” instead of both being bows).

Once you’ve got your initial tags, list it out in sequence in order of roughly head to toe. The order makes it a lot easier to implement the tags when tagging, and easier for the end user to use when you include this in the description, since you can just remove stuff like “socks, shoes” off the end if the camera is too high to show them. Also note down any image source tags for styles in your training data. It is very important to include these if your training data is 3D (video game screenshots or CG animation), and they are still helpful in general for controlling style even with 2D data.

The next step is going to vary a bit based on if you train locally or with the on-site trainer, but the general idea is the same. For the on-site trainer go to “create” on the top bar and select train. There pick the type of LoRA you want to train (such as character) and give it a name. The name here will determine the file name and to make it stable for on-site use you want a relatively short name with only “standard” English characters (see my dedicated article on the issue this works around. You can rename a resource’s page name later but this doesn't change file name.). Generally “[character name] ([work name])” is sufficient, though if the work has been rebooted under the same name or (like an American comic) had multiple artists and you're only capturing one you'll want to specify the incarnation. After that, upload your images and hit auto-label, listing tags you’ve decided on under blacklist, then set the tags to generate to 30 and start the process (this may take a few minutes). Now open the list of tags and look through it. Pick out any useful tags you didn’t think of (“short hair with long locks”) and add them to your list of tags to use.

Actually tagging

Now you’re ready to actually start tagging. Hit auto-label again, set it to overwrite, place your character’s trigger and face information (as well as source type tagging if you’re using only one type of media exclusive) in the prepend, those and all other tags you’ve decided upon in the blacklist (this lets you avoid checking which of these tags have already been applied automatically), and click tag. Now go through the list of tags and remove both those specifying eye color (but not type), and those you deem erroneous (if the character wears a skirt, not a dress, remove “dress”.) or purely supertypal to clothing tags you’re already using (“shirt” if you’re using “blouse”, “pants” if you’re using “jeans”. Stacking like this will make it harder to change the character into other outfits where parts have a shared supertype). The one exception is colors (skirt, pink skirt) which should be used redundantly because 1: It's needed if you have a mix of monochrome and colored data 2: It lets the LoRA produce monochrome or oddly colored styles when desired.

Now you’re ready to actually start tagging your pictures. This is really just a matter of tagging your chosen tags (I like copy-pasting from a text document) and any other things you can find tags for (style, pose, background elements, camera framing), while removing erroneous tags (This includes the face details you automatically applied if e.g., they are facing away and you can’t see their “bangs”. It's generally far easier to remove incorrect tags from a few images than to add correct tags manually to the majority of images).

The on-site trainer doesn’t actually let you see the full image normally, cropping it to fit in a square box (I am once again asking for Civitai to fix this), but you can work around this by copy-pasting it into an image editor (preferably a lightweight one like (Kolour)Paint since you'll be doing it plenty) or, if you didn't zip your pictures and just uploaded them with multiselect/manual dragging each, right click and open in new tab (doing this with images uploaded as part of a zip returns a broken page. A bug preventing a workaround to a bug),

After all that, review your list of tags. Check the list of tags to make sure nothing is weird or broken (Like “socks shoes” instead of “socks, shoes” or "ocks" instead of "socks". You can fix such missing commas or partial words quickly with the replace function) and download your tagged training data. Even if this does succeed, CivitAI only stores your TD for 30 days and you may want to try a V2 in the distant future. Occasionally your training will fail and you’ll be refunded but have to start again and this makes it easier (I’ve only had it happen… thrice ever, all in my early days), even if you can DL the TD from the training page you don’t want to risk it. You can also DL your TD during tagging if you want to pause since this isn't saved if you leave mid-tagging.

4: Set settings and train data set

The easiest part. You just need to adjust certain settings for the training before you actually start it. I’m assuming the on-site trainer and its defaults unless otherwise specified, but most should apply to other methods.

First set the model type. I’ve only used “Pony” (bar one 1.5 at request), but I think most of this will be applicable to SD1.5 and SDXL (Really, if you’ve come this far you know which model you want). If you somehow got this far and intend to use Flux, TURN BACK! Data tagging for Flux is completely different and outside my knowledge set.

Next enter your prompts for training images (on the on-site trainer). I’ve got a full guide on getting the most out of this, but the short version is you want to avoid CivitAI’s “feature” to auto generate a prompt, because it will be worthless, and specify everything (because if you leave it to chance, you're stuck with 9 more images of it doing something you didn't want). After that just set your training options. Since you’re using tags, you’ll want shuffle captions on. For a character that doesn't have a huge data set, setting Network Dim and Network Alpha to 16 will halve your LoRA’s file size (from ~200 MB to ~100 MB) at, as far as I’m aware, minimal to loss of quality with such size of LoRA (likely a bad idea for something with very large data sets and lots of training though). If you have a fully symmetrical character, you can enable flip augmentation, though I’ve found a lot of characters with symmetrical outfits actually have minor non-symmetrical details (most commonly the bangs) and barely get to use it. The tooltip for clip skip suggests changing it for anime, but I have no idea if that's right or not (and haven't really done any purely anime based data set loras except Gundam X style, where I think I followed that instruction). You can also increase the settings to some degree without going past the 500 buzz minimum price, but I don’t know what the best is here (I’ve left everything I haven't talked about here at default and its been fine).

After you’ve set everything, confirm and let it sit for the hour+ it takes to bake. You can then publish your LoRA. Generally you want the newest (highest) epoch, but if you increased the epoch count past 10 there’s a chance higher epoches could be "overbaked" (You can tell when the stuff that the LoRA isn't explicitly trained on starts decaying in quality in the sample pics and/or when vestiges of things from training data that you didn't prompt for start inserting themselves). Note that when you click publish, you lose all other epoches and their sample pics (the samples for your chosen epoch are kept as the default gallery of your model), so save them locally if you want them or think you want them.

5: Extra: Publishing your LoRA.

While you already have the LoRA, I’m going to give a few basic pointers on publishing it.

First: Have a proper description. At the bare minimum you should have a basic listing of who the character is (“Main character of X”, “A minor character from X that appeared only in the episode Y”), the tags used for the outfit and a list of non-character tags you used often in training data that may be undesirable (such as an odd artstyle like "lineart") or see pop up unprompted in generations (including epoch sample pics). Past that, I like to list the number and general type of pics (X official art, Y anime screenshots etc.) because it’s typically the most underdocumented parts of LoRA making and having that info out there helps. You can do more for a description, but I prefer to make sure the intro isn’t so long it forces data on prompting to be under “show more” (you can have whatever you want under that info).

At this step you'll also be asked what permissions you want to assign. I normally enable all of them EXCEPT "Use different permissions on merges" (if enabled, someone can legally ignore the others with the bare minimum of work), "Sell generated images", and "Sell this model or merges" (self explanatory). You may want to disable "Use on other generation services" depending on your preferences, but there's very little good reason to disable use of Civitai generation service: You get buzz from people using it and they'll sometimes show you neat things they made with it. You get to set this per model (I have enabled full permissions on certain models based on only public domain/CC data).

On the next step of publishing you'll be asked to set information that's version specific. First, uncheck "This version doesn't require any trigger words" and add your trigger words and the prompts for your characters. It's preferred that the character's physical attributes (hair style, eye color, body) go in one block (list them, then hit enter, then start the next block) and outfit go in another (and any second+ outfits can go in another block). "Optional" accessories that aren't as "attached" to the character (such as a bag) and/or are problematic to generate (such as held items) as well as alternate features where most other tags are still applicable (e.g., Super Saiyan hair+aura), and piece of clothing only seen in certain circumstances (e.g., a character who is normally shown with a closed jacket might have a block of "open jacket, t-shirt, print shirt" if you've got such a variant in training data to consider it trained) can go in their own block.

You'll also be asked to input a recommended strength for your LoRA. I have no idea why there isn't a default here, but unless you've got a good reason, just set the minimum to 0.1, the max to 2, and the default strength to 1. Those numbers will be fine in almost every case.

X: Misc. Questions:

How do I make a LoRA NSFW Capable?

Unless you’re making an SD1.5 LoRA, or a style LoRA, you don’t need to and any mention of it in a Pony LoRA is likely just a vestige of the SD1.5 days. SD1.5 had an issue where characters only trained on a single outfit, especially if couldn’t nail every part to a tag, were a mess to take out of that outfit and it wanted light/no clothing in data set to be able to accurately understand the character’s body shape. Pony doesn’t really have this issue in the first place, so as far as I know "NSFW capable" on a character isn’t something you need to shoot for unless you're actually baking for 1.5 (can't tell you about base XL).

Styles can do lewd stuff without having been trained on it, but it helps a lot if the style has samples of how it does naughty parts (or, at least, skimpy clothes) should they exist. This will make it so so the style LoRA can exactly duplicate things like how an artist draws nipples. It's not needed, but it helps make a good facsimile of a detail most users will care about

Non-Character LoRAs?

I honestly don't have enough experience with these to say for sure, but in that limited experience and my research I think it is the same general ideas of gathering data sets and tagging them with everything. I don't know the lower limit, but styles for SD are generally considered to want a much higher quantity of training data, especially if they're general purpose (rather than a single prospective/purpose like portraits). Adding characters to a style is just a matter of including a trigger for each character and describing them consistently, but beware of overlaps in trigger. Be warned non-characters are generally going to take much longer since you will have to actually think about the tagging instead of just being able to copy/paste the applicable parts and you have many more images to work on.

One point of general advice I can give is that if you're trying to make LoRA of something you might find in the public domain (Remember: Some governments, including US, make everything created by government employees on duty public domain automatically so you can actually find a weird variety of stuff) like a location or military equipment, you should check Wikimedia Commons. They've got a lot of photos you can use for whatever (beware some are CC-By and/or CC-SA rather than public domain) and it's got a working search (with ability to filter by license).

How do I thank you for this wonderful guide?

Use some of my LoRAs and post the results to their respective gallery (not a crosspost). I love seeing what people do with my LoRAs and it's always a shame to see them DLed and reviewed without me getting to see what was made with them.

10

Comments