santa hat
deerdeer nosedeer glow
Sign In

Valstrix's Crash-Course Guide to LoRA (& LyCORIS) Training

Valstrix's Crash-Course Guide to LoRA (& LyCORIS) Training

Originally a copy-pasted discord guide I wrote, this is my own two-cents for the Civit crowd. With so many guides out of date and/or with incorrect information, I hope this will be helpful to aspiring and current trainers alike.

Keep in mind this is based off of my own personal workflow and experiences! It will likely be by no means perfect, and I don't plan to make this an absolute encyclopedia, either. Just a good ol' crash-course to get your feet wet - Though it's evolved into something akin to a handbook, now.

I am also going to assume you have a system capable enough for local training - I recommend 6 GB of VRAM at an absolute minimum. (Though this theoretically should be mostly applicable to training in other environments, such as collabs or the on-site trainer.)

That being said, this guide will assume you haven't even gathered a dataset yet, so let's dive in!

Disclaimer: Throw out whatever you learned from that YouTube video from Mr. Generic AI Man posted 3 months ago - YouTube is always behind/outdated, don't confuse yourself.

Note: FLUX settings and more detailed coverage on the model will be Soon

  • Meanwhile, new big-batch settings have been added, compatible with both SDXL and SD 1.5!

    • The SDXL version will eat ~24GB of VRAM though, be warned!

Part 1 | Datasets: Gathering & Basics

Your dataset is THE MOST IMPORTANT aspect of your LoRA, hands down. A bad dataset will produce a bad LoRA every time, regardless of your settings. Garbage data in gives garbage data out!

Ideally, training a good LoRA will use a decent number of images: For SD1.5, I recommend ~30 images, with a minimum of ~15: But I have used as few as 8 and gotten a 'decent' result.

For SDXL, I've had the best luck with ~30 images, with an absolute minimum of ~12. Some folks have found half-decent ways of doing single-image trainings on XL, but I haven't tried them myself yet. Generally, try to have at least 20 images to work with.

When using my big-batch settings, I've found that trainings tend to work better with ~40 images over 30 - the more the better, generally.

That said, you can go higher or lower than my recommended numbers, but generally speaking those are some good values to shoot for - but don't over-saturate your dataset! Having too many images will needlessly slow your training for rapidly diminishing returns, and depending on the contents can give you more trouble than it's worth. For most non-style instances, I would cut off your dataset at 50 images maximum.

If you're struggling to get a large enough dataset, don't worry too much, as there is a method to artificially expand your dataset, but we'll touch on that later.

When assembling your images, ensure you go for quality over quantity. A well-curated set of 30 images can easily outperform a set of 100 poor and mediocre images. Especially in smaller datasets, a single "bad" image can offset the entire model in an awful way. That being said, bad images CAN be used to pad out a dataset, but should be tagged properly (such as with "colored sketch", which will be talked about later).

You should also ensure your images are varied. Too many images of a similar/same style will try and bake itself into your concept, making changing the style exceptionally difficult, and making any style changes biased. Especially when dealing with lots of screenshots and renders, you should be careful. If you do have a significant amount, tagging them with the artist that made them, as a render, etc, can help tie the style to another tag and reduce the impact. Conversely, incredibly distinct/powerful styles can have an influence on the overall learned style even in small numbers, so I recommend tagging them as well.

I would also recommend avoiding fetish-themed images when working with characters (unless you want that out of your LoRA), as even when tagged their often extreme anatomy can skew your model in a way worse than if they weren't present at all. You can of course use them to expand your dataset if you truly need to, but make sure they are tagged thoroughly.

Personally, I gather my data from a variety of sites: Game wikis, pixiv, deviantart, danbooru, e621, & furaffinity are my common sources. Again, make sure you try and avoid pulling too much from the same artist and similar styles. Google image search is also worth looking at if you need more data, as it can often find isolated instances from reddit, steam community feeds, and other sites you may not have thought of looking through.

Pixiv is a godsend for obscure/less popular eastern franchises as a JP art site, but finding specifics can be difficult at times as you need to use japanese text to guarantee your search. Thankfully, a number of wikis also include japanese names if applicable. Note: As of 4/25/24, Pixiv has blocked 'sensitive content' in the US and UK - This can be bypassed by setting your account region to anywhere outside of those locales, should you care about using said content for training.

As you gather your images, you should consider what is it you want to train: A concept or a style. A "concept" can be anything that would only affect part of a generated image: A character, outfit, object, etc. A "style" is anything that will affect the entire image, such as themes, art styles, or world morphs.

For concepts, you should primarily look for solo images of the subject in question. Duos/Trios also work, but you should only grab them if your primary subject is largely unobscured. Alternatively, extra individuals can be easily removed or cropped out. If you do include multi-character images, make sure they are properly and thoroughly tagged. Including duo/trios/etc can be beneficial to using your LoRA in multi-character generations, but is not required by any means.

If you're planning on training a style, know that those are more advanced, and are better suited to LyCORIS than LoRA. This guide will still be largely applicable, but check the later parts for details specific to them.

Once you have your images, place them in a folder for preparation in the next part. (For example, "Training Folders/Concept Folder/Raw")

Part 2 | Datasets: Preparation

Once you have your raw images from part 1, you can begin to preprocess them to get them ready for training. You will need a photo editor program handy: I recommend Photopea as a free web alternative to photoshop. Paint.net & Krita are both valid options, as well.

Personally, I separate my images into two groups: Images that are ok on their own, and images that require some form of editing before use. Those that meet the below criteria are moved to another folder and then edited accordingly.

Take a look at the extension of your images. .webp images (usually pulled off wikis) are incompatible with current trainers, and must be converted to a PNG or JPEG. While you do that, note that images with transparent backgrounds also cause issues. These should be brought into your image editor of choice, and should be given a background. I would recommend using multiple solid colors, if you have more than one - background variation can be incredibly useful. Alternating between white/black/blue/green/red/etc, and tagging them as such, can help your training if backgrounds are causing issues, and just in general.

Next, consider your training resolution. Higher resolutions let you get more detail out of an image, but will slow your training time. Using SD1.5, most people train at 512 or 768, but intermediary resolutions are also applicable, such as training at 704 if you can't fit 768.

If you're using SDXL, your minimum resolution must be at least 1024, which mostly everyone trains on.

Any image that is larger than your resolution will be scaled down automatically. When cropping or resizing, the smallest dimension of the image should generally be no smaller than your training resolution. If so, consider upscaling the image. The resolution will be detailed further later.

Once you have an idea of your resolution, take a look at your dataset. Keep in mind that non-square images will be scaled to maintain their proportions (based on your bucket resolution), so having lots of empty background can be detrimental to getting details. Wide and tall images with lots of empty background can be cropped to focus on the subject.

Additionally, if you have hi-res images with multiple depictions of your subject (like a reference sheet) you can crop the image into multiple parts to have it trained over several images over one overly compressed one. Such images can also be trained without cutting and cropping, just be sure to tag them with "multiple views" and "reference sheet" later on.

Images with people other than the subject should be focused on the subject if you plan to use them as they are, or if you plan to turn them into solo images edit out the other subjects if possible, be it via cropping or removal.

While not necessary, overlaying elements such as text, speech bubbles, and movement lines can be removed. You should remember that AI learns off of repetition, and the same element in the same spot on multiple images will be something it tries to hold on to. It's alright if a handful have them, but ideally you want as little repetition among them as possible, and those that can't be removed but repeat often should be tagged. Since repetition is key, outliers are (usually) less likely to stick. The magic eraser tool is very useful for any of these that aren't on a flat color background.

If you have images smaller than your training resolution, consider upscaling them. Upscalers like 4x_Ultrasharp are great for this. Personally, I've been using 4x_foolhardy_Remacri. Be careful of the upscaler you use, as poorly made upscalers will sometimes introduce their own artifacting on your images, which can negatively impact your results.

If you have images larger than 3k pixels on a side, downscale them to 3k or less. Apparently, the Kohya trainer has some minor issues handling very large images when downscaling them for training - Ensuring your images are below that threshold will help slightly with quality.

Lastly, if you have a subject with asymmetrical details (like a marking, logo, single robot arm, etc), make sure it is facing the same way in each image. Images incorrectly oriented should be flipped for consistency. If this isn't done, logos and markings will often fail to generate correctly, and larger asymmetrical details could generate on both sides of the object. Should you have these kinds of details, make sure you don't enable "flip augmentation", detailed further in.

Once you've done the following, place all of your images that you edited, and the images you didn't edit, in a new folder like so: "Training Folders/Concept Folder/X_ConceptName" the 'ConceptName' will be your instance token (what you prompt with), and the 'X' will be the number of repeats on their folder per epoch, which will be detailed later. It should look something like "1_Hamburger".

Part 2.5 | Datasets: Curing Poison

Note (7/26/24): It has been determined that these steps really aren't required in any real way - image poisoning techniques have been found to only work in such niche situations that practically any form is DOA. Generally speaking, the following set of conditions needs to align for poisoned data to have any tangible impact on your training:

  • The poisoning technique was based on the exact same text encoder as your model.

  • The poisoning technique also used the exact same or similar VAE as you're training with.

  • The amount of poisoned data is proportionally higher than unpoisoned data by a sizable margin.

  • The poisoned data covers a wide variety of classes. - This last one is truer for finetuning, but still loosely applies to LoRAs.

It's still a good idea to clean or discard obviously poisoned images, but it's less to combat the poison and more to have a clean image without artifacts. The poison is actually snake oil :)


Since Nightshade especially is getting a lot of traction right now, I figure I'll put a section here covering "poisoned" images. You won't run into these too often, but it's quite possible as they increase in popularity.

The purpose of image poisoning tools like Glaze and Nightshade is to add "adversarial noise" to an image, which disrupts the learning process by effectively adding insane outliers and obscuring the original data it would train on. As such, including a poisoned image in your dataset can result in strange abnormalities, be it color variations, distorted anatomy, etc. The more poison you have, the worse the effects will be. Ideally, you don't have any - but you CAN still use them.

These "poisons" have a hilarious weakness - the very noise they're introducing. By simply taking the desired image, and putting it through an AI upscaler good at denoising (like with jpeg artifacts or the sort), or even just a general upscaler, you can just... strip away the poison. It's that easy, usually. People are still experimenting with the "best" methods for removal, but frankly, especially with Nightshade, pretty much any method can clean the image to a usable state.

"Smoothing" or "Anti-artifact" upscalers work best for the job, used with one of the two following methods:
A: Just upscale it. 2x is usually fine.
B: Downscale to half or 3/4 size, then upscale with AI. Works best with already large images, small resolution images would lose too much detail.

Alternatively, "adverse cleaner" can do a decent job, and exists as an extension for A1111 or as a HF Space. Combined with the upscaling methods above, you can effectively neutralize the "poison" entirely.

"But how do I recognize a poisoned image?"

It depends on how aggressively the work was poisoned - If it looks someone put a silly camera filter on it, has some pretty obvious artifacting, or looks like the entire image is covered in a Jpeg compression artifact - It's 9/10 times poisoned. Less aggressive poisons are harder to detect, but have less of an impact on your training. If you're unsure, take a close look at it in a photo editor, and/or just run it through the cleaning methods before to be safe.

As a general note, the works with hyper-aggressive poisons on them usually aren't even worth using in the first place - Self-respecting artists generally keep the poison minimal to not affect their work visually in any major way, or just don't use it. If you don't feel like dealing with poison, pull your data from older images if you want to be wholly safe, or just learn to identify poisons and skip by them.

Part 3 | Datasets: Tagging

Almost done with the dataset! We're in the final step now, tagging. This will be what sets your instance token, and will determine how your LoRA is used.

There are a variety of ways to do your tagging, and a multitude of programs to assist tagging or do auto-tagging. However, in my personal opinion you shouldn't use auto tagging (especially on niche designs and subjects) as it makes more work than it assists with. (However, auto-taggers are improving rapidly.)

Personally, I use the Booru Dataset Tag Manager and tag all of my images manually. You COULD tag without a program, but just... don't. Manually creating, naming, and filling out a .txt for every image is not what you want to do with your time.

Thankfully, BDTM has a nice option to add a tag to every image in your dataset at once, which makes the beginning of the process much easier.

Before you tag, you need to choose a model to train on! For the sake of compatibility, I suggest you train on a Base Model, which is anything like a finetune that is NOT a mix of other models. Training on a mix is still viable, but in my experience makes the outputs less compatible with anything not that model. If you only want to use your LoRA on THAT specific mix, you're perfectly fine to train on it, however.

Now, for the tagging itself. Before you do anything, figure out what type of tags you'll be using:

  • Currently, there are three types of prompting styles, as follows: Natural Language Prompting, (dan)Booru Tag Prompting, and e6 Tag Prompting, which you should use based on your models "ancestry."

  • Base SD1.5, Base SDXL, and most Non-Pony SDXL models use Natural Prompting, ex: "A brown dog sleeping at night, they are very fluffy." Sometimes works on other models, but is not recommended.

  • The vast majority of Anime models you see use Booru prompting, specifically using the tag list from Danbooru, an anime image board. I hear Anything v4.5 is a good choice for 1.5.

  • Models with an ancestry based on furry models use e6 prompting, using the tag list from e621. Fluffyrock or BB95 is a good choice here for 1.5. PonyXL is the best for your XL choices.

  • FLUX models, from what I've seen, are rather flexible in accepting different tagging styles. I still need to experiment with this myself, but it seems natural language is the preferred method, but tagging in a booru-like style works too.

    • When tagging images for FLUX, be aware that you should not use tags like "1girl". FLUX interprets "girl" and "boy" as children of said gender, so use "a woman" or "a man" instead.

Once you know what model and tags you're using, you can start tagging: But before that, you should be aware of how LoRAs work. A LoRA overrides weights from the model you train on to give them new meaning - If you tag a dress that appears the same in every image as "dress", you will override the base knowledge of the model to tell it "dress" actually means the dress from your dataset not any other dress - be careful of overriding common tags, as they can fight back, too, making the trained object less coherent.

Your FIRST tag on EVERY image should be your instance token, aka what you named X_ConceptName, in this case "conceptname". If your model already has your subject even remotely trained to that tag already, consider changing your instance token to a string it wouldn't have. For example, "hamburger" could be "hmbrgrlora". This isn't always required, but if you see wacky results that stem from the models original interpretation, you might want to do so.

Your SECOND tag on EVERY image should be your class token, a general "container" for your instance token. This tells the AI what your subject is, generally, to aid in training. For example, a sword is a "weapon", Lola Bunny is an "anthro" and so on. Not every image needs to have the some class token, however! I often run mixed datasets, and having multiple classes for one instance token is perfectly ok - so long as your model can make sense of it.

My process works something like:

  • Add to all images at once: instance token (usually species), class token (anthro/feral/human), gender (if applicable, ferals not specified), controllable elements (ie. a character-specific outfit), nude, other common controllables (like most common eye color).

  • Move to first image; Remove if needed: controllable elements. Change if needed: nude (to general outfit tag(s)), eye color, etc.

  • Add tags you would consider to be "key" elements to the image: Specific mediums (like watercolor), compositionals (ie three-quarter portrait), etc.

  • Add tags to describe deviated aspects: huge/hyper breasts, horns/scales/skin of varied color, etc.

  • Repeat for each image.

That being said, don't go overboard with your tags. If you use too many, you'll "overload" the trainer and get less accurate results, as it's trying to train to too many tags. It's generally best practice to only tag items you would consider a "key element" of the image. Undertagging is better than overtagging, so if in doubt keep it minimal. I usually have ~5-20 tags per image, depending on their complexity.

Backgrounds and poses can often be ignored or lightly tagged, but if you have specific kinds of locations/poses/BGs in a significant number of your images, you should tag them to prevent biasing.

For example, if you have a lot of white backgrounds, you should tag "white background". If after a training you see a specific pose being defaulted to, you should find all instances in your dataset using that pose and tag it.

You should also be wary of "implied" tags. These are tags that imply other tags just by their presence. By having an implied tag, you shouldn't use the tag(s) it implies alongside it. For example, "spread legs" implies "legs", "german shepherd" implies "dog", so on and so forth. Having the tags that are implied by another spreads the training between them, weakening the effect of your training. In large quantities, this can actually be quite harmful to your final results.

Tagging Color: You may find yourself asking whether or not to tag the color of objects in your dataset. There is no right or wrong answer here, but you should consider what you plan to do with your lora or have it be capable of.

  • If you want to change the color of specific outfit pieces, for example, it's a good idea to tag the color of repeated objects.

  • If color is not tagged for an object that is the same color consistently, the color will be baked into the tag it gets associated with. Some models handle recoloring them ok, others do not.

Tagging low-quality images: Sometimes, you just don't have a choice but to use poor data. Rough sketches, low-res screenshots, bad anatomy, and others all fall into this category.

  • Sketches can usually be tagged "colored sketch" or "sketch", which usually is all you need to do. If uncolored, "monochrome" and "greyscale" are usually good to add, as well.

  • Low resolution images should be upscaled with an appropriate upscaler if possible, such as one of the many made to upscale screenshots from old cartoons, for example. If you can't get a good upscale, use the appropriate tag for your model to denote the resolution quality, but these are sometimes best left out.

  • Bad anatomy should be tagged as you see it, or cropped out of frame if possible. Images with significant deviations that can't be cropped or edited, like the neck/head/shoulders being off-center, those are usually best left out of the dataset entirely.

Tagging PonyXL: Pony can be tagged normally as above, but the custom score, source, and rating tags warrant a small explanation of how to use them:

  • Source and Rating tags I do not use, as these are key compositional tags - changing them through training a lora could shift how the model works entirely, so it's best to leave them be.

  • Score tags are similar, but we can use them to a degree.

    • You should not manually score your images: Pony's scores were determined by an unknown image classifier, and selective scoring could damage the model as with the source and rating tags. Without the classifier, we can't reliably do manual scoring without risk to the model.

    • You can, however, do a broad application of "score_4_up, score_5_up, score_6_up, score_7_up" on all of your images. While not required, I and a few other FD members have found that a broad application of these scores will sever the concept trained from being as heavily skewed by the score tags, increasing the quality of smaller details and general coherency. It's not a "magic fix" that will make your lora twice as good, but should be a small improvement for little effort.

  • Pony's special tags do nothing on other models: Using them outside of pony will likely worsen your training.

Once you've tagged all your images, make sure you've saved everything and you'll be good to go for the next step.

Part 3.5 | Datasets: Prior Preservation (Regularization)

While completely optional, another method of combating style bias and improper tag attribution is via the use of a Prior Preservation dataset. This will act as a separate but generalized dataset for use alongside your training dataset, and can usually be used generally between multiple training sessions. I would recommend creating a new folder for them like so: "Training Folders/Regularization/RegConceptA/1_RegConceptName".

"But how exactly do I make and use these?"

You can start by naming your folder after a token - your class token is often a good choice.

Creating a dataset for these is actually incredibly easy - no tagging is required. Within the folder you created for the tag, you simply need to put in a number of random, unique, and varied images that fall within that tag's domain. Do not include images of anything you'll be training. From my own testing, I personally recommend a number roughly equal to the number of images in your main dataset for training, but keep a larger folder of ~50-100 images to pull from if you train with more data in the future, rather than going to expand your regularization set every time you have more data than before.

That said, your reg dataset doesn't explicitly have to be varied. While variance is good for general-purpose usage, let's say you have a lot of screenshots in your primary dataset, or many images from the same or similar artist(s). In either case, stylistic bias could be difficult to remove. While tagging the image's styles can help, it isn't always enough to fully separate that style. In this case, you can create a reg dataset of the style specifically: Just chuck a bunch of the artist's works into a folder, take a bunch of screenshots, etc, and then name the reg folder with the appropriate tag.

During training, the trainer will alternate between training on your primary and regularization dataset - this will require you to have longer training to achieve the same amount of learning, but will very potently reduce biasing.

You can also use multiple different regularization datasets in the same training, just put both folders in the regularization directory you set during training. Remember that folder repeats will matter here - you can always rerun training with more repeats of the regularization if it isn't powerful enough, but be wary of increasing your step count too much.

Another thing to note, is that you can directly influence the strength of your regularization learning without just adding repeats or more images to the dataset. In the "advanced" tab of the GUI's settings, the setting "Prior loss weight" controls this. By default, it sits at 1: This weights the regularization images the same as your normal training images, which can water down your training. The closer you take the value to 0, the weaker the effect will be. Be warned that you may need more steps than usual when using regularization datasets.

Part 3.6 | Tagging: Examples

Since examples are usually quite helpful, I'll put a handful of examples from my own datasets here for your own reference. Keep in mind: I usually train on fluffyrock and Pony Diffusion, models that use e6 tagging. Other models should swap tags to their own variants where required. (ex: side view (e6) > from side (booru))

mizutsune, feral, blue eyes, no sclera, bubbles, soap, side view, action pose, open mouth, realistic, twisted torso, looking back, hand on ground, white background

  • White backgrounds were more prevalent in this dataset, so the background was tagged.

arzarmorm, human, male, black hair, brown eyes, dark skin, three-quarter view, full-length portrait, asymmetrical armwear, skirt, pouches, armband, pants

  • In this case, the model wasn't cooperating with just the instance token alone, so the tags "asymmetrical armwear, skirt, pouches, armband, pants" were used as reinforcement, which also detached them from the main concept, allowing them to be controlled individually.

  • This LoRA also had very few instances of white backgrounds, so leaving it untagged was a non-issue.

Part 4 | Training: Basics

Now that you have your dataset, you need to actually train it, which requires a training script. The most commonly used script, which I also use, are the Kohya Scripts. I personally use the Kohya-SS GUI, a fork of the SD-Scripts command line trainer. It is usually a bit behind in updates, but is perfectly usable. Both are valid options, and other options exist, but for the sake of compatibility I'll stick with Kohya GUI as a frame of reference. Most settings should work decently in other trainers, as well.

Once you have it installed and open (Install is actually quite easy.), make sure you navigate to the LoRA tab at the top (it defaults to dreambooth, an older method.)

Upon request by several, I have now included two .json presets that can be directly loaded by the Kohya GUI as attachments to the guide: These are NOT plug-and-play, you still need to set your model and other directories. The adaptive json is set to DAdaptAdam by default but can be swapped to Prodigy w/o issue, and the AdamW json by default utilizes the "Aggressive Lite" LR preset. Both jsons have persistent data loading disabled, and caption shuffle enabled. I highly recommend you still read the guide before making changes, these are intended as starting templates.

There are a lot of things that can be tweaked and changed in Kohya, so we'll take it slow. Assume that anything I don't mention here can be left alone.

Yellow text like this denotes alternative, semi-experimental settings I'm testing. Feel free to give feedback if you do use them, but if you're looking for something stable, ignore these. These settings will change frequently as I test and train with them. Once I'm happy with a stable setup incorporating them, they will be adopted into the main settings.

Green text will denote settings for my SDXL setup. If you're using SD1.5, ignore these.

Pink text will denote settings for my new high-batch settings - I've now tested them to work both with SDXL and SD 1.5, but they require much more VRAM to utilize.

We'll go down vertically, tab by tab.

Accelerate Launch 

This tab is where your multi-gpu settings are, if you have them. Otherwise, skip this tab entirely, as the defaults are perfectly fine. Training precision is also here, and should match your Save precision in the following tab, but you won't touch it otherwise.

  • Mixed Precision: bf16

Model

This tab, as you've likely guessed, is where you set your model for training, select your dataset, etc.

  • Pretrained model name or path:

    • Input the full file path to the model you'll use to train.

  • Trained Model output name:

    • Will be the name of your output file. Name it however you like.

  • Image folder (containing training images subfolders):

    • Should be the full file path to your training folder, but not the one with the X_. You should set the path to the folder that folder is inside of. Ex: "C:/Training Folders/Concept Folder/".

  • Underneath that, there are 3 checkboxes:

    • v2: Check if you're using a SD 2.X model.

    • v_parameterization: Check if your model supports V-Prediction (VPred).

    • SDXL Model: Check if you're using some form of SDXL, obviously.

  • Save trained model as:

    • Can stay as "safetensors". "ckpt" is an older, less secure format. Unless you're purposefully using an ancient pre-safetensor version of something, ckpt should never be used.

  • Save precision:

    • "fp16" has higher precision data, but internally has smaller max values. "bf16" holds less precise data, but can use larger values, and seems faster to train on non-consumer cards (if you happen to have one). Choose based on your needs, but I stick with fp16 as the higher precision is generally better for more complex designs. "float" saves your LoRA in fp32 format, which gives it an overkill file size. Niche usage.

    • bf16

Metadata

A section for meta information. This is entirely optional, but could help people figure out how to use the LoRA (or who made it) if they find it off-site. I recommend putting your username in the author slot, at least.

Folders

As simple as it gets: Set your output/reg folders here, and logging directory if you want to.

  • Output folder:

    • Where your models will end up when they are saved during/after training. Set this to wherever you like.

  • Regularization directory:

    • Should be left empty unless you plan to use a Prior Preservation dataset from section 3.5, following a similar path to the image folder. Ex: "C:/Training Folders/Regularization/RegConceptA/".

    • I have found these setting require little to no regularization, if you happen to have been using it before.

Parameters

The bread-and-butter of training. Mostly everything we'll set is in this section: Don't bother with the presets, most of the time.

  • Lora Type: Standard

    • Alt: LyCORIS/LoCon

    • LoCons, after a decent amount of trainings and testings, seem to overfit easier than a standard LoRA, so be wary of that when using them.

      • Used alongside DoRA, I haven't encountered overfit issues with LoCons like before.

      • It seems certain presets can avoid this issue without DoRA, which I'm looking into.

    • LyCORIS/LoCon

  • LyCORIS Preset: Full

    • attn-mlp

  • Train Batch Size:

    • How many images will be trained simultaneously. This will usually speed up your training, but will also increase your vram usage substantially, and having too large a batch size will slow you down, instead of speeding you up.

    • For most modern models (XL and later), it is usually a pure benefit to have as large of a batch size as you can fit on your GPU, to a degree: Your batch size should never exceed half the size of your dataset.

      • Keep in mind, higher batch sizes will require higher learning rates to compensate.

    • There has been some talk about exponential values being best to use (1/2/4/8/16/etc), but in most cases you'll struggle to fit larger values on most consumer GPUs.

    • This is not always beneficial on 1.5-based models, which work much better with small batch sizes. (1-4)

    • Currently, I've been using a batch of 2.

      • New settings use a batch of 8. If you don't have enough vram, try 4.

  • Epoch:

    • A single epoch in steps is the number of images you have, multiplied by the "X_" number.

    • What you set this value to is dependent on your dataset, but as a rule of thumb I start with a number that gets your total step count close to 2000.

      • Ex: "1_Hamburger" has 20 images. The folder repeat (1_) is a value of 1: To reach 2000 steps, we would run it for 100 epochs.

      • Ex: "10_Hamburger" has 20 images. The folder repeat (10_) is a value of 10: To the trainer, it sees this as 200 images. to reach 2000 steps, we would run it for 10 epochs.

      • Both of these methods train the same amount: How you do it is purely personal preference and has 0 impact on your final results.

      • Alternatively, leave this value at 1 and set your step count directly in the "Max train steps" value.

    • While not perfect, a 2000-step target makes for a good starting value to test your dataset.

      • Some trainings may need more or less, depending on their complexity and your learning rate: Don't be worried if you massively overshoot that number, as more complex concepts can often require up to 4000 steps.

    • Keep in mind, changing your batch size or gradient accumulation steps will change the visible step count during training, but still train for the same total amount.

      • Ex: 1000 steps at batch size 2 is equal to 2000 steps at batch size 1.

      • Ex: 500 steps at batch size 2 with a gradient accumulation of 2 is also equal to 2000 steps at batch size 1.

      • This is important to keep in mind for these settings, as we'll use a batch of 8 w/ a gradient accumulation of 2 for an effective batch of 16: In this case, 125 steps will equal the goal of 2000.

  • Max train epoch:

    • Optional. Forces your training to stop at X epochs, useful in some scenarios. Overridden by next value.

  • Max train steps:

    • Optional. Forces training to stop at the exact step count provided, overriding epochs. Useful if you want to stop at a flat 2000 steps or similar.

  • Save every n epochs:

    • Optional. Saves a LoRA before it finishes every X number of epochs you set. This can be useful to go back to and see where your sweet spot might be.

    • I usually keep this at 1, saving every epoch. If the final epoch is overtrained, I go backwards to find the best version. I recommend larger values for high epoch counts.

    • Your final epoch will always be saved, so setting this to an odd number can prove useful, such as saving every 3 epochs with a 10 epoch training will give you epochs 3, 6, 9, & 10, giving you a fallback right at the end if it started to overbake.

  • Cache latents & Cache latents to disk:

    • These affect where your data is loaded during training. If you have a recent graphics card, "cache latents" is the better and faster choice which keeps your data loaded on the card while it trains. If you're lacking VRAM, the "to disk" version is slower but doesn't eat your VRAM to do so.

    • Caching to disk, however, prevents the need to re-cache the data if you run it multiple times, so long as there wasn't any changes to it. Useful for tweaking trainer settings.

  • LR Scheduler:

    • When using Prodigy/DAdapt, use only Cosine. When using an Adam opt, Cosine With Restarts is usually best. Other schedulers can work, but affect how the AI learns in some pretty drastic ways, so don't mess with these until your understanding of them is better.

  • Optimizer:

    • There are a number of options to choose from, but the four worth using IMO are Prodigy, DAdaptAdam, AdamW, and AdamW8bit.

    • Prodigy is the newest, easiest to use, and produces exceptional results.

    • The AdamW optimizers are quite old, but with fine tuning can produce results better than prodigy in a faster time.

    • For the purposes of this guide, we'll be using Prodigy and AdamW.

    • DAdaptAdam is very similar to Prodigy, and these settings should be largely applicable to it, as well. It has a less aggressive learning method, so if you're having issues with Prodigy try this out.

    • For my new settings, I'm using a custom optimizer made by LodestoneRock, creator of the FluffyRock family of SD 1.5 models. Specifically, I'm using Torchastic, their latest iteration of the optimizer.

      • As this is custom, individuals training on-site or in collabs will not have access to this optimizer - Sorry!

      • For local trainers, this can be installed by placing the "stochastic_optim.py" file in kohya_ss\sd-scripts\library

      • We will call Kohya to use this file later: The optimizer dropdown won't show it, and can be set to anything as it will be manually overridden.

  • Optimizer extra arguments:

    • Prodigy/DAdapt: Set to "decouple=True weight_decay=0.1 betas=0.9,0.99".

    • AdamW: weight_decay=0.11 betas=0.9,0.99

    • Torchastic: betas=0.9,0.999

  • "Learning Rate":

    • When using Prodigy/DAdapt, set this to 1. Prodigy and DAdapt are adaptive and set this automatically as it trains.

      • As a general note, the specific "text encoder" and "unet" learning rate boxes lower down will override the main box, if values are set in them.

  • LR warmup (% of total steps):

    • Optional. 10% is a good value in most scenarios. For simpler concepts a model is already mostly aware of (like anime characters), 5% seems to be a decent choice, too.

    • SDXL: 20%

    • Torchastic: 0%

  • "LR # cycles":

    • If using an Adam opt, set this to 3. Only affects specific schedulers that utilize restarts.

  • "Max resolution":

    • For most SD 1.5 models, you'll want this set to 768,768.

    • Models that allow for larger native generation (like SDXL for example) should use larger values like 1024,1024.

    • You should not set this to be larger than your model can generate natively. Less powerful cards can train at 512,512, but will have reduced quality.

      • Alternatively, many models based on the old NAI leak (most SD1.5 anime models), can be trained at 640,640.

    • Even if your model can generate quite large, I highly recommend keeping the training resolution at the standard minimum value - increasing your training size will substantially increase your vram usage and slow down your training for a diminishing return on quality.

    • SDXL: Do not set lower than 1024.

  • Enable buckets: True.

    • This groups similarly sized images together during training. This should *always* be on.

  • Min/Max bucket resolution:

    • The values set here affect the minimum and maximum sizes of your images respectfully, based on their shortest/longest side.

    • If a side of one of your images is smaller than the minimum size, it will be scaled up.

    • If a side of one of your images is larger than the maximum size, it will be scaled down.

      • When the bucket does this, it will attempt to best preserve the aspect ratio of the image, dependent on "bucket resolution steps" (see advanced). If the image cannot be scaled into a bucket perfectly, it will be center-cropped to fit.

      • Especially on extra tall/wide images, this can absolutely crop portions of the subject - If you notice this occurred, crop the image manually.

    • Generally, I set the maximum to twice the size of my training resolution, but 1.5x is also valid. This mostly affects tall/wide images.

    • SD 1.5: Max 960

    • SDXL: Max 2048

  • "Text Encoder & Unet learning rate":

    • SD1.5 AdamW: I have these set to 0.00005 and 0.0001 respectively.

    • SDXL (Standard): 0.0001 & 0.0003

    • SDXL (Aggressive Lite): 0.0003 & 0.0003

    • SDXL (Aggressive): 0.00045 & 0.0004

    • Torchastic: 0.00015 & 0.0004

      • "Standard" values will be suitable for simpler concepts that a model should find easy to learn.

      • "Aggressive" values are better suited for multi-concept trainings and concepts that may be particularly tricky for the AI to pick up on in training. May be more prone to overfitting depending on your dataset, use with caution.

        • The "Lite" values are still fairly aggressive, but are less prone to overfitting than the fully aggressive parameters - If "Aggressive" is too much for your concept, try Lite.

  • "SDXL Specific Parameters":

    • This subcategory only appears if you've checked the SDXL box prior.

      • Cache text encoder outputs: Can reduce VRAM usage, but is incompatible with caption shuffle/dropout/etc. Good to use if you're not using anything incompatible with it.

      • No half VAE: Should always be True, imo, just to save you the headache.

  • "LyCORIS":

    • This subcategory only appears if you're using a LyCORIS lora type.

      • DoRA Weight Decompose: This setting allows you to use the new DoRA training method.

        • DoRA changes the way a LoRA learns significantly, training it similar to a full finetune than a standard LoRA - This results in slower training times, but higher quality and coherency, especially with smaller details.

        • After testing, I've found that for character training DoRA isn't worth it: On average, it would double my training time for pretty meager improvements. it may be more useful for style training, though.

  • Network Rank & Network Alpha:

    • These affect how large your file will be and how much data it can store. What you set this to will be dependent on your subject.

      • Examples are formatted Rank/Alpha.

    • SD 1.5: 32/32 is a good general starting value.

    • SDXL/Torchastic: 16/16

      • In the vast majority of cases, you should never increase your values higher than 64. This will bloat your file size significantly and may potentially damage your final results.

      • Your Alpha should be kept to the same number as your Rank, in most scenarios.

      • Using DoRA, it seems like you can easily halve your values (including Conv values), while retaining quality.

    • Adaptive optimizers like Prodigy, Adafactor, and DAdapt should always set their alpha to 1.

    • When not using adaptive optimizers, there's some talk of using an alpha that's actually much higher than your rank, following the equation "(net alpha * sqrt(net dim))", which should better preserve learning rates. Common values using this would be 64/512, 32/181.02, 16/64, & 8/22.63, as rank/alpha respectively.

      • Testing this showed some interesting results, but I've opted to not use it.

  • Convolution Rank & Alpha:

    • Rank of 16 w/ an alpha of 1. Going higher than 16 seems to give diminishing returns, and may actually harm outputs.

    • Torchastic: 4/4

  • Scale weight norms:

    • This assists your LoRA in working well with other LoRAs in tandem, but can be semi-destructive to your output.

    • Personally, I recommend using a regularization dataset instead of using this.

      • If you plan to use your LoRA with other LoRAs, set this value to 1.

      • If your LoRA will likely only ever be used on its own, leave at 0.

      • Depending on your concept, your weights that get too "heavy" are scaled down, reducing their impact. This allows multiple LoRAs to work in tandem by not fighting over values, but in some instances CAN negatively affect your final outputs.

      • Setting to values higher than 1 will reduce the impact, but also reduce cross-compatibility.

      • The scaling seems to have significantly less of a negative impact on LyCORIS training, given the learning is spread over more weights. Can usually be kept at 1 without worry.

      • Currently, I've opted to keep this at 0 from now on.

  • Network Dropout:

    • Recommended, but optional. A value of 0.1 is a good, universal value. Helps with overfitting in most scenarios.

Advanced (Subtab)

We won't touch much here, as most values have niche purposes.

  • Gradient accumulate steps:

    • This value effectively multiplies your batch size by the set value - It doesn't use extra vram, but instead lets the trainer run the batch X times before averaging them and using that to update the weights.

    • Torchasitc: 2

      • If you reduced your batch size to 4, set this to 4 as well.

  • Prior loss weight:

    • Specifically for regularization datasets, value should depend on desired strength.

      • Use a value from 0 to 1, which acts as a percent of strength. 0 does nothing, 1 is equal to 100%.

      • In most cases, 1 will be fine.

  • Additional parameters:

    • If your model supports zSNR, use "--zero_terminal_snr".

    • If using the custom optimizer, use "--optimizer_type library.stochastic_optim.Compass"

      • This will override the selected optimizer in the dropdown to use the new file directly.

      • To use other custom optimizers, the command is structured like library.filename.internal name

  • Scheduled Huber Loss:

    • Allows selecting a new type of loss - I'm not well versed on what they all change, but they differ slightly from the traditional "L2" loss.

    • Torchastic Settings:

      • Loss type: huber

      • Huber schedule: snr

      • Huber C: 0.1

  • Keep n tokens:

    • For use with caption shuffling, to prevent the first X number of tags from being shuffled. I usually keep this set to 2 or 3, keeping the first few tokens from being shuffled.

    • If using shuffling, this should always be 1 at minimum, which will prevent your instance token from being thrown around. I recommend 2 so both the instance and class tokens are kept together.

  • Clip skip:

    • Should be set to the clip skip value of your model. Most anime & SDXL models use 2, most others use 1. If you're unsure, most civit models note the used value on their page.

  • Full bf16 training:

    • True

  • Gradient Checkpointing:

    • Check to save VRAM at a slight speed cost. Has no effect on output quality.

    • This should practically always be turned on when training SDXL.

  • Shuffle Caption:

    • Optional. If true, this will shuffle the tags (outside of the first X kept in place by "keep n tokens") every time the image is trained, which helps with general flexibility.

    • Some consider this useless or a "cope", but it's usefulness varies with your dataset. It also adds randomization into your training, and with it on, running the same training twice can give you two slightly different LoRAs in the end.

  • Persistent Data Loader:

    • Optional. This option keeps your images loaded in-between epochs. This eats a LOT of your VRAM, but will speed up training. If you can afford to use it, use it.

  • Memory Efficient Attention:

    • Use only if you're not on a Nvidia card, like AMD. This is to replace xformers CrossAttention.

  • CrossAttention:

    • xformers, always. (As long as you're on a Nvidia card, which you really should be.)

      • If for whatever reason you can't use xformers, SDPA is your next best option. It eats more ram and is a bit slower, but it's better than nothing.

  • Color augmentation:

    • Do not.

  • Flip Augmentation:

    • Optional. This allows you to essentially double your dataset by randomly mirroring your images horizontally during training. This can be especially useful if you have few images, but DO NOT use this if you have asymmetrical details that you want to preserve.

    • Seems to be useful with these settings, if you don't care about asymmetric details.

  • Min SNR Gamma:

    • For LoRA training, this should nearly always be 1. Speeds up training slightly.

      • The reason is a bit complex, but going above 1 while training loras can actually be harmful to your training, as the paper-recommended value of 5 is assuming you're training on raw, random latents. In lora training, your latents are neither raw or anywhere near as random, so a value of 1 is what should be used.

      • For now, I've settled on a value of 1.5, which has yielded solid results.

  • Debiased Estimation Loss:

    • False.

      • Helps with color deviation, and supposedly makes training need fewer steps.

      • Has minor incompatibilities with Min SNR Gamma - use one or the other.

  • Bucket resolution steps:

    • 64

      • The size, in pixels, that separates your buckets during training. The smaller this number is, the better your aspect ratios will be preserved but you will increase the number of buckets. Conversely, making this larger will decrease the number of buckets but will damage your aspect ratios more.

      • Generally, the less buckets the better, but for LoRA training 64 is generally the best balance - I would only recommend smaller sizes if you have a sizable amount of data.

  • Random crop instead of center crop:

    • This should almost always be false. Prevents usage of many caching options.

      • Changes the behavior of bucket cropping to crop a random section of the image than the center of it when handling oversized images. Usually does more harm then good.

  • Noise offset type:

    • Original

    • Multires

  • Noise offset:

    • 0

    • SDXL: 0.03

  • Multires noise iterations:

    • 6

  • Multires noise discount:

    • 0.3 - 0.1

      • Still determining the best value, go with 0.3 if you're unsure.

  • IP noise gamma:

    • 0.1

      • Adds additional noise, very potently assists regularization.

And that's everything! Scroll to the top, open the "configuration" dropdown, and save your settings with whatever name you'd like. Once you've done that, hit "start training" at the bottom and wait! Depending on your card, settings, and image count, this can take quite some time.

Part 5 | Q&A

This section is reserved for tips, tricks, and other things I find handy to know that don't quite fit elsewhere. I'll try and update this periodically.

Q: I see other guides saying to set your Network Alpha to half of the Rank, why don't you?

A: That's a fairly old misconception that still gets thrown around a lot. Alpha functionally acts as a means to change your learning rates: It being half your Rank is half the learning rate. It doesn't hurt to have it at half or even lower, but you will likely need a longer training.

Q: My training script is showing a loss value that keeps changing as training goes, what is it?

A: For most cases, you don't need to worry about loss, nor should you worry over specific values or ranges. The only time you should pay attention to it is if you see it around a certain range for most of the training, just for it to make a massive change later in. That's a sign something may have went wrong, or it started to overtrain.

Q: How do I tell if my LoRA is under/overtrained?

A: Both should be fairly obvious, even to the untrained eye. If you're undertrained, you'll likely see "mushy" or incomplete details, or a very low adherence to details. If you're overtrained, you may have odd, over-saturated colors, style biasing, pose biasing, etc. These will vary depending on your dataset, so keep an eye out.

Q: You briefly talked about fp16 and bf16, but what are the "full" versions I'm seeing?

A: "Standard" fp/bf16 use mixed precision, while the "full" versions don't. It's misleading, but the full versions hold less precise data, but can be incredibly fast to train with. I'm sure they have their uses, but in most cases you're perfectly fine in staying with mixed precision.

Q: I keep seeing mentions of "Vpred", what exactly is it?

A: Vpred, or V-Prediction, or V-Parameterization, are all the same thing. While I don't fully understand it at a technical level, as far as I am aware it is an optimization to the noise schedulers that "predicts" outputs during image generation, allowing for a final result to be generated in fewer steps.

Q: What is Min SNR? zSNR? Zero Terminal SNR? Are they the same, or different?

A: No, while similar, they do rather different things. To keep it simple, zSNR (Zero Terminal SNR) is a technique that allows for the AI to generate using a wider color space, including perfect blacks. Think of it like the difference between a normal monitor and a HDR OLED monitor. Min SNR is a method of accelerated training convergence, which allows models to train in fewer steps.

Q: Could I train at a resolution higher than what my training model can do?

A: Can you? Yes. Should you? No. While normally higher resolutions are a tradeoff of quality for speed, in this case you would be trading speed for worse results. Without getting technical, training larger than your model can handle is not good for your outputs.

Q: You mentioned not to "overtag" your images, but how many is too many?

A: This will really depend on your dataset and training settings. Longer trainings can help with overtagging, but run a greater risk of overtraining. Generally, try and keep your per-image total to 20 or below on average, but having outliers with more isn't the worst. Try and avoid tags that aren't important to the image (unless you're finding that the results are clinging to something too much, in that case tag it), and tags that your model has little to no knowledge of. Empty tags are seen as training targets, and will try to be filled. If filled with the wrong data, you can end up with seemingly random tags being required to get the intended result.

Q: What's the difference between a LoRA and a LyCORIS? Are they even different?

A: Every LyCORIS is a LoRA, but not every LoRA is a LyCORIS. LyCORIS specifically refers to a subset of newer LoRA training methods (LoCon, LoHa, LoKR, DyLoRA, etc.). All of these are still LoRAs, but their new methodologies make them structurally different enough to have their own designation. Now that most GUIs have built-in support for them, to an end user they functionally make no difference in their usage. LoRA on its own simply refers to the original method.

Q: My LoRA kinda works, but has very strange, distorted anatomy at times. What happened?

A: More often than not, distorted anatomy originates from your dataset. Look it over for images that are similar to the distortions you are seeing. Uncommon poses, strange camera angles, improperly tagged duo/group images, and other outliers can be likely causes. Try tagging what's applicable, but it's usually best to remove the image entirely or crop out the parts causing issues, if possible.

Q: I've heard a bit about single-tag training, what is it?

A: Training with a single tag is a very old method commonly used by beginners who don't want to spend time tagging. When training to a single tag, the AI will "homogenize" everything it learns from an image into the tag, resulting in highly generalized outputs. This will only even begin to work if every image is of a specific subject (like a character), and has a very high likelihood of latching on to specific backgrounds, poses, and other unwanted variables. If used with anything else that isn't repetitive, you'll end up with what is effectively digital mush. I would not recommend this for any application.

Q: I've seen other people say to tag their images in a different way than you do, not having any tags to describe the subject outside of their primary token. Is that better? Worse?

A: Neither! It's just a different method of tagging: It is, however, much less flexible. If you take a character for example, tagging the character as just the character can make it difficult to change their eye color, outfit, or other details. If you don't care about that, it's perfectly valid, though! Alternatively, my method is much more flexible, but getting the exact likeness of the character will require multiple tags.

Part 6 | Advanced Training: Multi-Concept LoRAs & Folder Trickery

Multi-Concept Training

So you've got your feet wet, and want more of a challenge? Or maybe you've got a character with many outfits? Gender-specific armor? That's where multi-concept training comes in.

The actual training settings for these are almost exactly the same compared to normal LoRAs, with a few caveats:

  • Be careful with using flip augmentation, as it will apply to every image, not just one concept.

  • Depending on how many concepts you're training, and how complex they are, you may want to increase your Rank and Alpha values. I recommend trying 32 first and seeing how it performs.

Now, gather your images the same way I detailed before, but separate them based on their concepts (outfits, armors, etc). Any editing, too, should be done like before.

Once you've fully prepared your data, figure out which concept has the most data, and in your concept folder, create a 1_conceptname folder for it.

Now, do the same with your other concepts, obviously replacing "conceptname" with their instance token.

Once you have your folders named and filled, do the following:

  • Take the number of images in your largest folder, then multiply them by the "X_" to get your total step count. (images*folder repeats) = steps

    • For example, Folder A has 51 images, Folder B has 43 images. Folder A would be used. Assuming 10 folder repeats, that gives us 510 steps for Folder A. (51*10)

  • Now, divide the step count by the number of images in your second largest folder. The resulting number, rounded to the nearest whole, is the number that that folders "X_" should be changed to.

    • So, Folder B has 43 images. (510/43) = 11.86, which we round up to 12. We now have 10_FolderA and 12_FolderB.

  • Repeat this for every applicable folder.

    • Folder C has 32 images, so we compare it to folder A just like before. (510/32) = 15.93, which we round up to 16.

  • In our example, we now have three folders balanced together. These could be left as is, or, since they are all divisible by two, you can reduce each X_ by half to get 6_, 8_, & 5_ respectively. Remember you will be multiplying these by your epochs!

Why do this, you ask?

We do this to balance the dataset. If you keep everything the same, the folder with the most images will dominate the training, leaving the other concepts with a fraction. We balance the dataset to ensure every concept gets equal training time, which prevents one from dominating and the other concepts from undertraining.

You should keep in mind, however, if you have very few images in a concept folder that individual concept could overbake, even if the rest of the LoRA is fine. This is a bigger issue the larger the discrepancy between it and the largest folder is.

Now that your folders are balanced, we should look at how you name them, and what your activator tag for each will be.

If you're training a character with multiple outfits, be sure to name your folders so you can distinguish them!

If you're training something not tied to a character, like gendered armor, I usually just create a tag for each version. For example, "armortagm" and "armortagf" for males and females respectively. Just like before, these should be the first tag on their respective images.

Now that your names and activator tags are settled, you can start tagging! This can be done just like a normal lora, you've just got a whole lot more images to go through.

And that's it! once you've tagged, you can train it just like before. You'll likely have much longer training times, given the increase in images, but in the end you'll have multiple concepts in a single LoRA to use as you please.

Folder Trickery

Sometimes, working with only the best, most accurate data can end up not being enough to train effectively - or maybe you just want to pad it out for better flexibility. Regardless, in most cases you can find yourself with subpar or slightly-incorrect data lying around you might not want to use outright. Behold the solution: Folder repetitions!

If you read the above section, you probably saw the dataset balancing segment - We'll do something similar here.

Let's say you have a dataset of 60 images - 30 good, quality images and 30 subpar images. While you could shove them all in the same folder and call it good, you don't want the subpar images skewing your result too much. Instead, split them into two folders!

By default, since each folder has an equal image count keeping them both with equal repetitions will give each folder 50% of the training - no different than if they were combined. So what if we want to give the subpar data less weight?

The solution is to increase the repetitions of the good data: Assuming equal image counts, treat the repeats like a ratio between the data.

  • 1_ & 1_ will give each set 50% of training. (1:2)

  • 2_ & 1_ will give the 2-repeat set 66% of training. (2:3)

  • 3_ & 1_ will give the 3-repeat set 75% of training. (3:4)

  • So-on and so-forth: You get the idea.

While it gets trickier to balance with different image counts, the same concept applies across all folders in your training directory.

Part 7 | Advanced Training: LyCORIS & Its Many Methods

LyCORIS gets more advanced by the day, and as it increases in commonality I feel it best to have a section talking about it. This will be slightly more technical than the rest, but I'll try to keep it to the "need-to-know" stuff.

LyCORIS Types:

  • LoCON: A LoRA with that also affects the convolution layers of the base model, allowing for more dynamic outputs.

  • LoHa & LoKR: A LoRA that essentially is two different versions of itself, which are combined/averaged by Hadamard Product and Kronecker Product respectively. They take longer to train, and are more oriented towards generalized training.

  • DyLoRA: Short for Dynamic LoRA, this is a LoRA implementation that allows the Rank to change dynamically, but is otherwise a normal LoRA.

  • GLoRA: Short for Generalized LoRA, this is an implementation that is made for generalizing diverse datasets in a flexible and capable manner.

  • iA3: Instead of affecting rank like most LoRA, iA3 affects learned vectors, resulting in a very efficient training method. Similar (seemingly a bit better?) to a normal LoRA, in a much smaller package.

  • Diag-OFT: This implementation "preserves the hyperspherical energy by training orthogonal transformations that apply to outputs of each layer". In short, this type is better at preserving the base models original understanding of items that are coincidental to the training (like backgrounds and poses). This also apparently converges (trains) faster than a standard LoRA.

  • Native Fine-Tuning: Also known as dreambooth, which we aren't focusing on and will ignore for this guide. The LyCORIS implementation allows it to be used like a LoRA, but it produces very large files.

"So, what should I use?"

I would personally say each has their own uses, so I've categorized them semi-generally. I'm still not super knowledgeable about their intricacies, but I've largely based these on their official implementation notes and documentation. What you choose is up to you and entirely based on your needs.

  • General Purpose:

    • LoCON, DyLoRA, iA3, Diag-OFT

  • Multi-Concept:

    • LoCON, LoHa, LoKR

  • Concepts:

    • LoCON, LoHa, LoKR, GLoRA

  • Styles:

    • LoCON, GLoRA, iA3

Benefits, Drawbacks & Usage Notes:

  • LoCON:

    • Widely Applicable

    • Affects More Model Layers

    • Slightly Larger Files

    • Basically Just A LoRA, But Better

      • Dim <= 64 Max, 32 Recommended

      • Alpha >= 0.01, Half Recommended (When not using an adaptive optimizer)

  • LoHa & LoKR:

    • Good With Multi-Concept Training

    • Good With Generalization

    • Longer Training Times

    • Bad With Highly Detailed Concepts

    • Can Be Hard To Transfer

      • LoHa

        • Dim <= 32

        • Alpha >= 0.01, Half Recommended (When not using an adaptive optimizer)

      • LoKR

        • Small: Factor = -1

        • Large: Factor = ~8

  • DyLoRA:

    • Automatically Finds Optimal Rank

    • Longer Training Times

    • Otherwise Just A LoRA

      • Use with large (~128) Dim, Half Alpha (When not using an adaptive optimizer)

      • Use Gradient Accumulation

      • Batch Size of 1 Max

  • GLoRA:

    • Very Good At Generalization (Styles & Concepts)

    • Shorter Training Times (?, To Test)

    • Not Very Good At Training Non-Generalized Subjects

  • iA3:

    • Very Small File Sizes

    • Generally Applicable

    • Generally Performs Better Than LoRA

    • Good With Styles

    • Can Be Incredibly Hard To Transfer

      • Use with High LR (When not using an adaptive optimizer), official implementation recommends 5e-3 (0.005) ~ 1e-2 (0.01)

  • Diag-OFT:

    • Faster Training Time

    • Better Preserves Coincidentals

    • Generally Applicable

Part 8 | Advanced Training: Styles & Themes

So, you want to train a style of some kind. Regardless of what it is, for broader concepts a LyCORIS is the tool for the job, but unlike a LoRA, there are several kinds of LyCORIS to choose from. If you skipped Part 7, I recommend a LoCON, but a standard LoRA also works.

Once you've chosen your type, make sure your rank is set to 32 or lower. LyCORIS seems to have some issues above certain points (though you can go as high as 64 I believe), but 32 is the generally agreed upon maximum before you start getting issues.

Now that that's out of the way, you should start building a dataset, just like before. However, style trainings benefit much more from larger datasets, so instead of the 15-50 range from before, look to get around 50-200, in my experience 125-150 is a good place to be.

Once you've got your images, start tagging. You can generally tag the same way as before, but keep in mind that you want the style, not a character or article of clothing. You should especially be sure to tag backgrounds, clothing, and any other key element.

After tagging, you're good to start training. In my experience, these usually take fewer epochs to train compared to a LoRA: While I recommend ~2000 steps for a concept LoRA, styles can bake within fewer steps, but your mileage may vary given the size and composition of your dataset.

Part 9 | Post Training: LoRA Resizing (SDXL)

As many people will notice, SDXL loras are big. They take up way more space than 1.5 loras, eating up those precious gigabytes of your storage. Luckily, we have a solution: Resizing.

While this process is entirely optional, you can reduce your file size significantly and losslessly, too!

While the Kohya scripts have a way to do this, they are slow and unoptimized, so we'll use a new project from a fellow FD member, Gaeros. Currently this project only supports SDXL LoRAs and LoCONs only, but support for other methods may come in the future. Download the project from its' Github, and make sure you have its dependencies installed.

After it's installed, open up a powershell/cmd window in the installed folder, and enter the following after you tweak it:

python resize_lora.py /path/to/checkpoint.safetensors /path/to/lora.safetensors -o /path/to/output/folder -r spn_ckpt=1,thr=-1.4

Which should look something like this, when filled out properly:

python resize_lora.py G:\stable-diffusion-webui\models\Stable-diffusion\ponyDiffusionV6XL.safetensors G:\LoRA_Output\MH_Dodogama_a1_8_PonyXL.safetensors -o G:\LoRA_Output -r spn_ckpt=1,thr=-1.4

Provided you have all the dependencies happy and pointed to the correct files, the program will, the first time, cache key values from your selected checkpoint, which it can then reuse as many times as needed when resizing another lora on the same model. This first time caching will be the longest you ever wait, as future resizes will take literal seconds.

These specific settings are my go-to for a safe, lossless resize, which I've found to work well in every instance I tested it. If you're feeling like messing with it, you can go for more aggressive resizes, but your model quality may drop, in some cases considerably.

Alternatively, I've more recently been using "spn_lora=1,thr=-0.7" instead, which I found better mirrors the settings used by the kohya resize - This method culls a little bit of data to trim outliers, which can sometimes improve quality.

Changelog

12/2/24

  • Part 1:

    • Minor tweak to recommended dataset amounts.

  • Part 3.5:

    • Changed reg value description to better align with effect.

  • Part 4:

    • Updated explanation of max bucket resolution, included minimum

    • Added explanation for bucket resolution steps, random crop, and IP noise gamma

  • Part 6:

    • Added section regarding folder balancing tricks.

10/17/24

  • Part 4:

    • Changed "Keep n tokens" description & values.

  • Part 6:

    • Generally updated, removed old info.

10/8/24

  • Part 2:

    • Tweaked some small sections for clarification.

  • Part 3:

    • Added subsection covering findings regarding the PonyXL custom tags.

  • Part 4:

    • Added new big-batch settings.

    • Added sections and basic explanations for Gradient Accumulation and Huber Loss.

    • Added command for using custom optimizers to the Additional Parameters.

  • Part 9:

    • Added alternate resize setting.

8/23/24

  • Part 4:

    • Changed Min SNR Gamma value to reflect new findings.

    • Expanded on Prior Loss Weight.

  • Part 8:

    • Changed mentions of repeats to steps.

8/20/24

  • Part 3:

    • Added small section for FLUX tagging, will be expanded on later.

  • Part 4:

    • Major change to batch size explanation to reflect some new stuff I learned, but no new settings (still experimenting).

    • Tweaked description of LoCON experimental setting.

    • Changed epoch description to be step-based instead of repeat-based.

  • Part 5:

    • Removed first Q&A regarding steps vs. epochs.

8/7/24

  • Part 4:

    • Changed Debiased Estimation Loss parameter.

    • Tweaked AdamW weight decay.

    • Added section for newly attached json presets.

7/26/24

  • Part 2:

    • Added some extra explanation to the section regarding asymmetrical detail.

  • Part 2.5:

    • Updated section with note regarding general redundancy.

  • Part 4:

    • Added another alternative LR parameter pair for the XL setup.

    • Acknowledged the existence of SDPA.

7/23/24

  • Part 4:

    • Tweaked explanation of Epochs to make things clearer.

    • Partly re-wrote and reorganized section on Rank & Alpha for better clarity and readability.

    • Smaller tweaks throughout for clarity.

7/19/24

  • Part 4:

    • Added some alternative parameters from my recent testing. (XL LR and Noise Offset)

7/17/24

  • A few more grammar fixes I noticed throughout.

  • Changed some wording for clarity throughout as well.

  • Part 3:

    • Added subsection regarding color tagging.

  • Part 4:

    • Tweaked some sections for clarity.

    • Updated thoughts on DoRA.

6/24/24

  • General language/grammar fixes spread throughout I felt like correcting.

6/12/24

  • Part 4:

    • Fixed some minor grammar errors.

    • Changed recommended LRs for my SDXL settings.

    • Added some previously skipped settings.

    • Added sections referring to DoRA usage.

  • Part 9:

    • Created, covering LoRA resizing with my own settings.

5/28/24

  • Intro:

    • Tweaked a bit.

  • Part 1:

    • Made distinctions between 1.5 and new XL recommendations.

    • Tweaked some language minorly.

    • Added disclaimer note to my Pixiv recommendation.

  • Part 2:

    • Minor tweaks to fix grammar errors I noticed.

    • Added a small amount of added explanation to some sections.

    • Added a recommended upscaler.

  • Part 3:

    • Added some distinctions between SD1.5 & XL.

    • Tweaked some text to add some emphasis on certain parts.

    • Tweaked some wording in part 3.5.

    • Slightly updated tag examples in part 3.6.

  • Part 4:

    • bmaltais changed the GUI layout, so now I get to re-write the instructions >:(

      • Reformatted most of the section, and adopted some experimental settings into the standard setup.

      • Also added specifics for my SDXL settings.

3/24/24

  • Added missing learning rates for my AdamW setup. Whoops.

  • Slightly expanded Part 3.5.

3/4/24

  • Added section covering class tokens to Part 3.

  • Changed Part 3.5 (Tagging examples) to 3.6.

  • Added new Part 3.5, covering Prior Preservation.

    • Added Regularization Folder setting to Part 4.

  • Updated experimental settings.

  • Tweaked some descriptions of settings.

  • Added a few clarifications in misc. areas.

2/29/24

  • An extensive but overall minor rework, implementing critiques from fellow trainers, namely ArgentVASIMR.

    • Part 1:

      • Expanded on Duo/Trio/Group images.

    • Part 2:

      • Expanded/edited several paragraphs to provide more alternatives and clearer info.

    • Part 3:

      • Minor edits, changed terminology to be more appropriate.

    • Part 4:

      • Tweaked some settings, mostly in regards to Adam opt usage.

      • Added a bit more description to some settings.

      • Added parameters for caption shuffling.

      • Added experimental section regarding alpha scaling when not using adaptive optimizers.

    • Part 5:

      • Tweaked some wording to better adhere to proper terms.

      • Tweaked Q&A regarding Full fp16 training.

    • Part 6:

      • Tweaked some terminology, again.

      • Slightly changed and provided an example for the dataset balancing formula.

2/26/24

  • Updated experimental settings.

  • Tweaked some of the explanations minorly in regards to LyCORIS.

2/9/24

  • Updated experimental settings.

  • Added more details to part 4.

  • Added a brief section regarding some new findings to part 2.

2/1/24

  • Added part 2.5, a subsection regarding Nightshade and other AI "poisons".

1/7/24

  • Moved part 7 to part 8 & removed LyCORIS explanation.

  • Added (new) part 7, going more in-depth on LyCORIS.

  • Tweaked some experimental parameters.

1/5/24

  • Tweaked experimental settings & added some explanations to some values.

  • Added Q&A questions.

  • Expanded on the "scale weight norms" value in part 4.

  • Corrected sections regarding minsnr and zsnr to differentiate them correctly.

  • Tweaked "additional parameters": Value no longer required.

12/30/23

  • Added experimental settings to part 4.

  • Changed title to include LyCORIS.

  • Added Q&A question.

12/28/23

  • Correction of more grammar errors.

  • Slightly expanded Part 1 & 2.

  • Added section covering implied tags to Part 3.

  • Added minor elaborations to some areas.

12/27/23

  • Correction of minor grammar errors in parts 3 & 4.

  • Added new Q&A questions.

  • Added parts 6 & 7, covering Multi-concept and Style training respectively.

  • Added part 3.5 for tagging examples, added two to begin with.

12/26/23

  • Created Guide

683

Comments