Sign In

Non-Technical On-Site LoRA Training Guide Focusing on Dataset Creation🫐

57
Non-Technical On-Site LoRA Training Guide Focusing on Dataset Creation🫐

You only need images and a kindergarten degree in vocabulary to create a LoRA! Very simple to approach, but very tedious to construct. So here’s a simple article to help you build your first, or if not your first, add more insight to your understanding in creating a LoRA. 

DISCLAIMER: I LUV CHARACTERS GOING OUT THEIR BIZNESS AND SHIPS SAILING PASS TITANIC

WE BEGIN:

“A good LoRA has a vision for a subject among a consistently diverse set of images. Its tags are non-duplicating, non-conflicting, and free from false positives. Thus solidifying a dataset capable of being trained across all Generative Image Ecosystems”

Are you confused?

If yes, consider reading from Level 0.

Are you partially confused and only need to affirm something?

Skip level 0.



LEVEL 0:

You can create a LoRA for almost anything. For any subject with images and can be described in simple words. 

  • An unknown side character from your favorite scenic movie? Characters, Backgrounds and assets are beginner friendly

  • Have a concept art you want replicated in seconds to see if it is any good in scenes? Concepts and style are rather tricky


…I did not just rhyme them. Besides, why do you want to make a LoRA? For me, I love stories and I am a visual learner. So it is really therapeutic to generate full blown high quality-storyboard/book cover esque images, and entertaining when I want to see my favorite characters and ships (couples/groups) doing things outside their intended shows/business. 

For whatever reason, you want to build a LoRA. A LoRA needs…

  • A Vision

  • A Subject

  • A Dataset

VISION is your imagination. What will your output images look like?

  • Will it add flavor?

  • Will it resemble likeness?

  • Will it manipulate elements?

SUBJECT can either be

  • Character = A person, animated character, animal, or monster

  • Background = A landmark or scenic locations, 

  • Assets / Objects = Clothing, weapons, furniture, misc.

  • Concept = General Notion / something any character can do

  • Style = Themes, and general vibe to be layered on any of the above subjects

DATASET is your cooking pan. You COOK here. So when you cook, you have to be careful with whatever subject you are handling! Beginner friendly ones are just eggs and hotdogs, the tricky ones got that lingering feeling, odor, texture, that needs to be looked out for! Otherwise your dish would be a fail for the judge–the LoRA Trainer.

In short, a dataset is a collection of images with text files. 

  • Image - It contains visual elements that can be described

  • Text file - It corresponds / identifies to an image, containing tags describing the visual elements to anchor for training



Where you source the images is entirely up to you but for tags, always remember to keep them simple!

PLUS ANOTHER NOTE B4 You continue, I have my own style of approaching LoRA creation. I focus more on the dataset rather than the training parameters.

LEVEL 1: Perfectly Normal Datasets

A perfectly normal dataset has balance. Its images have a constant theme/focus element across varying contexts. For example, A goblin man with a third eye is your constant focus element. In his dataset, he is present across all images that show him in various angles, actions, and interactions. Its text files contain simple words describing the changing elements. 


Now, these questions may or may not have already jabbed your brain curlies

  • How many Images in total?

  • How many Images of this and that?

  • What kind of images?

  • How should I tag / caption?

  • What checkpoint to use?

  • What Base model?

  • What training parameters?


VISION #1 = I WANNA SEE THE CHARACTER/s in THEIR ORIGINAL LIKENESS!


STEP 1: Find images showing the character. 

  • They are the constant focus element of images in various actions / interactions

    • Mr. Third eyed goblin is speaking, or talking to another, or eating…

  • They are either a blur or in low quality

    • Mr. Third eyed goblin remains the main focus of an absurdly low resolution/quality image


STEP 2: Identify total image count

  • If # of images is less than or not equal to 25, refer to LIMITED dataset in Level 2

  • If # of images is more than or equal to 25, refer to NORMAL dataset here in Level 1


STEP 3: Image Curation (Takes up 80% of your LoRA creation time unless you have a scraper)

LoRAs for more than 2 characters, see Level 3

Vocabulary:

  1. Clone - An identical Image

  2. Altered - An altered image of the original


Solo NORMAL CHARACTER DATASET

When curating one LoRA character, I aim for 50-90 total images. These images are either purely solo (portrait, doing something, posing, etc.), or with other characters/subjects. (interacting, side-by-side, comic, etc.). Most of the images also show clear facial features especially on full body shots.

  1. Has a front, back, side, and foreshortened views.(Minimum # for each: 1)

  2. Dutch angle and foreshortening are welcome

  3. Has an eye, and hair focus (Minimum # for each: 1)

  4. Has images for both lateral sides of the body and face

  5. Shows the face

  6. Shows the face + body

  7. Other special details, tattoos, scars, etc.(5% of the total image count) 

    1. In focus, Clone at least twice for each special detail

    2. With the face/body, Clone at least thrice

  8. If Clone images are more than 3% of the total image count, duplicate the best images showing your character and alter them.

    1. Altered images is on a different aspect ratio

    2. Altered images should have enhanced colors or overlaid by a color filter

    3. Altered image is on a different quality from the original

    4. Altered image is on a different style from the original


Couple NORMAL CHARACTER DATASET

When curating for a couple/duo LoRA, I aim for 50-90 total images. These images are either collaged (both solo images of the characters are saved as 1 image next to each other) or both characters are in 1 image

  1. Shows the face of both 

  2. Shows the face + body of both

  3. Shows front-to-front, side-by-side, front-to-side, back-to-back, back-to-front, back-to-side (Minimum # for each: 1)

  4. Dutch angle and foreshortening are welcome

  5. Does not focus on one character’s features

  6. Both characters are interacting with each other 

  7. Both characters are not interacting with each other but still in one image

  8. Clone 3 images to solidify body / facial details

STEP 4: Tag them Up! (Takes up 20% or less of your LoRA creation time especially with an autotagger)


Solo NORMAL CHARACTER DATASET

Tagging style differs across LoRA creators. For me:

I have two tags present in all images, (Core tag, anime screencap)

  1. Core tag = (Name + [different colored Skin tone here] what they are [with if there is any] focus). This also works with captions.

  2. Anime Screencap = For switching on and of the average style in the dataset


  • Send your images to an autotagger (CIVIT’s Trainer has one)

  • Remove all biological related tags (skin color, torso features, leg features) except for the genitals.

    1. What I generally remove:

male focus, 1boy, muscular, muscular male, toned, toned male, bara, nipples, no nipples, pectorals, large pectorals, breasts, large breasts, medium breasts, small breasts, abs, navel, collarbone, thighs, thick thighs, biceps, arms, thick arms, mature male, manly, stomach, multiple boys, yaoi, hetero, couple, dark skin, dark-skinned male, ass, multiple penises, parody, style parody, official style, fake screenshot, virtual youtuber, anime coloring, no humans, horns, digimon (creature), monster, colored skin, duel monster, pokemon (creature), pointy ears, thick eyebrows, fat man, shota, child, loli, old, old man, father and son, father and daughter, rape, elf, orc, robot, mecha, centaur, fish boy, monster boy, devil, devil boy, devil girl, mermaid, merman, animification, scene reference, cosplay, fantasy, science fiction, horror (theme), furry male, fine art parody, pale skin, multiple girls, multiple others, yuri, cyborg, dark-skinned female, mother and daughter, daughter, father and daughter, father and son, mother and son, incest, husband and wife, mature female, mature female, height difference, size difference, age difference, siblings, sideburns, long sideburns, sanpaku, fujimaru ritsuka (male), satou Kazuma, natsuki Subaru, Uzumaki Naruto, Uchiha sasuke, kaito (vocaloid), hair between eyes, bangs, parted bangs, , archer (fate), undercut, monkey d. luffy, roronoa zoro, sanji (one piece), male child, super Saiyan, viktor nikiforov, producer (idolmaster), height difference, size difference, age difference, siblings, kyon, reiner braun, eren yeager, cyan skin, purple skin, multicolored skin, colored skin, antenna, GONNA ADD MORE IF THERE ARE CHARARCTER NAMES OR CONFLICTING TAGS I HAVE YET ADDED HERE 

  • Inconsistent elements that are in less than 15 images should be tagged

  • Change/prune the character’s hair tags to one single unique tag: Hair color_character name_hair

    1. NOTE: If you have images with other characters, remove their hair colors/styles

  • Change/prune the character’s eye tags to one single unique tag: eye color_character name_eyes

    1. NOTE: If you have images with other characters, remove their eye color.

  • Add the anime screencap

  • Add “low quality, blurry, worst quality” on blurry/low quality images

  • Review all tags

  • Send to Training, see level 4


Couple NORMAL CHARACTER DATASET

Tagging style differs across LoRA creators. For me:

I have two tags present in all images, (Core tag, anime screencap)

  • Core tag = (Name and name + what couple focus). This also works with captions.

  • Anime Screencap = For switching on and of the average style in the dataset


  • Send your images to an autotagger (CIVIT’s Trainer has one)

  • Remove all biological related tags (skin color, torso features, leg features) except for the genitals.

    • What I generally remove:

male focus, 1boy, muscular, muscular male, toned, toned male, bara, nipples, no nipples, pectorals, large pectorals, breasts, large breasts, medium breasts, small breasts, abs, navel, collarbone, thighs, thick thighs, biceps, arms, thick arms, mature male, manly, stomach, multiple boys, yaoi, hetero, couple, dark skin, dark-skinned male, ass, multiple penises, parody, style parody, official style, fake screenshot, virtual youtuber, anime coloring, no humans, horns, digimon (creature), monster, colored skin, duel monster, pokemon (creature), pointy ears, thick eyebrows, fat man, shota, child, loli, old, old man, father and son, father and daughter, rape, elf, orc, robot, mecha, centaur, fish boy, monster boy, devil, devil boy, devil girl, mermaid, merman, animification, scene reference, cosplay, fantasy, science fiction, horror (theme), furry male, fine art parody, pale skin, multiple girls, multiple others, yuri, cyborg, dark-skinned female, mother and daughter, daughter, father and daughter, father and son, mother and son, incest, husband and wife, mature female, mature female, height difference, size difference, age difference, siblings, sideburns, long sideburns, sanpaku, fujimaru ritsuka (male), satou Kazuma, natsuki Subaru, Uzumaki Naruto, Uchiha sasuke, kaito (vocaloid), hair between eyes, bangs, parted bangs, , archer (fate), undercut, monkey d. luffy, roronoa zoro, sanji (one piece), male child, super Saiyan, viktor nikiforov, producer (idolmaster), height difference, size difference, age difference, siblings, kyon, reiner braun, eren yeager, cyan skin, purple skin, multicolored skin, colored skin, antenna, 1girl, 2boys, 2girls, GONNA ADD MORE IF THERE ARE CHARARCTER NAMES OR CONFLICTING TAGS I HAVE YET ADDED HERE 

  • Inconsistent elements that are in less than 15 images should be tagged

  • Change/prune the character’s hair tags to one single unique tag: Hair color_character name_hair

    • NOTE: If you have images with other characters, remove their hair colors/styles

  • Change/prune the character’s eye tags to one single unique tag: eye color_character name_eyes

    • NOTE: If you have images with other characters, remove their eye color.

  • Add the anime screencap

  • Add “low quality, blurry, worst quality” on blurry/low quality images

  • Review all tags

  • Send to Training, see level 4


VISION #2 = I WANNA BACKGROUND / ASSETS FOR CHARACTERS!

STEP 1: Find images of your chosen background / asset (object)

  • They are the constant focus element of images in various actions / interactions

  • They are either a blur or in low quality

STEP 2: Identify total image count

  • If # of images is less than or not equal to 15, refer to LIMITED dataset in Level 2

  • If # of images is more than or equal to 15, refer to NORMAL dataset here in Level 1

STEP 3: Image Curation (Takes up 80% of your LoRA creation time unless you have a scraper)

LoRAs for more than 2 characters, see Level 3


BACKGROUND DATASET

If indoors focus on indoors, if outdoors focus on outdoors. If wanting both, proceed to Level 3

  1. Shows different natural lighting (day, afternoon, night)

  2. Shows environmental interaction (raining, snowing, fire, thunder, etc.)

  3. Shows characters / animals (not in focus) together with the background

  4. Shows different viewing angles / perspectives

  5. Include cropped sections of the background

ASSET DATASET

  1. Shows different natural lighting (day, afternoon, night)

  2. Shows environmental interaction (raining, snowing, fire, thunder, etc.)

  3. Shows characters / animals together with the asset

  4. Shows different viewing angles / perspectives

  5. Shows the asset in different background

  6. Include cropped sections of the asset

STEP 4: Tag them Up! (Takes up 20% or less of your LoRA creation time especially with an autotagger)


Tagging style differs across LoRA creators. For me:

  • I have two tags present in all images, (Core tag, anime screencap)

    • Core tag = (Name + focus). This also works with captions.

    • Anime Screencap = For switching on and of the average style in the dataset


BACKGROUND DATASET

If indoors focus on indoors, if outdoors focus on outdoors. If wanting both, proceed to Level 3

  • Send your images to an autotagger (CIVIT’s Trainer has one)

  • Remove all tags related to your BG

    • Ex. A Cityscape BG - remove all building tags, and sky

    • Ex. A living room - remove all furnitures,

  • Inconsistent elements that are in less than 15 images should be tagged

  • Add the core tag

    • Ex. A cyberpunk cityscape BG - cyberpunk scenery-cityscape focus

    • Ex. A goth indoor living room - goth interior scenery - living room focus,

  • Add the anime screencap

  • Add “low quality, blurry, worst quality” on blurry/low quality images

  • Review all tags

  • Send to Training, see level 4

ASSET DATASET

  • Send your images to an autotagger (CIVIT’s Trainer has one)

  • Remove all tags related to your BG

    • Ex. A wooden chair - remove all characteristics of the wooden chair (wood, chair, etc.)

  • Inconsistent elements that are in less than 15 images should be tagged

  • Add the core tag

    • Ex. A chair with a skull - chair with skull (object) focus

    • Ex. A bed with teeth - bed with teeth (object) focus

  • Add the anime screencap

  • Add “low quality, blurry, worst quality” on blurry/low quality images

  • Review all tags

  • Send to Training, see level 4

VISION #3 = I WANNA CONCEPT / STYLE FOR EVERYTHING!

STEP 1: Find images showing the concept / style. 

  • Has the general vibe of the concept / style

  • They are either a blur or in low quality

STEP 2: Identify total image count

  • If # of images is less than or not equal to 35, refer to LIMITED dataset in Level 2

  • If # of images is more than or equal to 35, refer to NORMAL dataset here in Level 1

STEP 3: Image Curation (Takes up 80% of your LoRA creation time unless you have a scraper)

LoRAs for more than 2 characters, see Level 3


CONCEPT DATASET

Shows the concept with characters (1girl, 1boy, 2girls, 2boys, 1other, animals, monsters, etc.)

  • BODY CONCEPTS

    • Shows the concept in different angles / perspectives

    • Shows the concept being interacted by the character / environment

      • Ex. Sharingan eye looking at wall

      • Ex. mutated hand grabbing things

    • Shows 5% of the total image focusing on the body concept

  • ACTION / POSE CONCEPTS

    • Shows the concept in different angles / perspectives

    • Shows the concept being interacted by the character / environment

      • Ex. Unique standing pose at Mt. Everest peak looking at wall

      • Ex. Unique couple position while it is raining

    • Shows a black/white silhouette of the concept

STYLE DATASET

  • Any images with the general vibe of the style are okay!


STEP 4: Tag them Up! (Takes up 20% or less of your LoRA creation time especially with an autotagger)

Tagging style differs across LoRA creators. For me:

  • I have two tags present in all images, (Core tag, anime screencap)

    • Core tag = (Name + focus). This also works with captions.

    • Anime Screencap = For switching on and of the average style in the dataset

CONCEPT DATASET

Shows the concept with characters (1girl, 1boy, 2girls, 2boys, 1other, animals, monsters, etc.)

  • Send your images to an autotagger (CIVIT’s Trainer has one)

  • Remove all tags related to your concept

    • Ex. NSFW focused concept - remove all body parts around the body part your NSFW LoRA will be focused on

    • EX. sharingan / mutated hand - remove all eye and arm tags

  • Inconsistent elements that are in less than 15 images should be tagged

  • Add the core tag

    • Ex. NSFW focused concept - NSFW action focus

    • Ex. Sharingan / mutated hand - 1other with sharingan and mutated hand focus

  • Add the anime screencap

  • Add “low quality, blurry, worst quality” on blurry/low quality images

  • Review all tags

  • Send to Training, see level 4

STYLE DATASET

  • Send your images to an autotagger (CIVIT’s Trainer has one)

  • TAG EVERYTHING or NOT. The latter will need more epochs

  • Add the anime screencap

  • Add “low quality, blurry, worst quality” on blurry/low quality images

  • Review all tags

  • Send to Training, see level 4





LEVEL 2: LIMITED DATASETS

Where the sources are nonexistent, so I rely now on concept bleeding and NSFW bias.

SINGLE IMAGE LoRA

Do you only have one image? 

I also made my OC’s LoRA from 1 image. But it did not take a single version! There were three! This one is the third and final version: https://civitai.com/models/1464742

I only know one approach to work with single image training–LET THE NSFW BLEED!--this also works with concepts, backgrounds, and assets.


1st version:

15-20 total images is the goal!

STEP 1: Finding that One Image

STEP 2: Crop that single image for every element you want the trainer to learn. 

  • For character - I crop the head, then to the eyes, nose distance, forehead distance, hair, etc. then the body, etc. Also include cropping the character’s lateral and transverse anatomy.

  • For body concept - I mainly focus on that body concept and training it as if it is an asset

  • For action concept - The body parts involved with the action are cropped

  • For Backgrounds - Cut it like its pizza

  • For Assets - Crop it laterally and transverse like you are cutting a mannequin

STEP 3: Cloning for Altered Images

  • Clone every single cropped images made

  • Edit the orientation, aspect ratio, and angle

    • Ex. a horizontally cropped lateral image of a bunny → edited the angle, orientation, and aspect ratio

  • Add a color overlay filter, GRAYSCALE IS A MUST!

OPTIONAL STEP: Add other related images from wherever source.

  • With Daberry’s first version, I generated a blue PP and added it for his first version. Allowing NSFW to bleed.

STEP 4: Tag the elements like a normal dataset, this time, manually tagging is the best approach.

  • If the images have a constant background, add “crossover”

STEP 5: Proceed to Level 4 for Training

STEP 6: Proceed to Level 5 for Prompting



2nd version

25-35 images is the goal!

STEP 1: GENERATE LOTS OF NSFW IMAGES from Level 5

  • If its background, clothing, or assets, try to add some skin. LET THE NSFW BLEED IN

STEP 2: Tag them up with the same core tag and unique tags

STEP 3: Send to Training

FINALSAY: If you can generate with the LoRA on 0.8 strength, no need to proceed with a 3rd version unless you wanna. 


3+ Multiple people in one image LoRA

The trickiest in ai generation is specifying multiple subjects. So this dataset is much trickier than it is already tricky. Luckily, I can confirm how collaging multiple solos together can work.

GOAL: 30-70 images

STEP 1: Find as many images of the characters together. If not, get their solos and create the collages.

STEP 2: Tag in the same way normal couple LoRAs are tagged. 

  • The core tag having the names of the characters + (input here number and gender) focus

    • Ex: luke pearce and artem wing and vyn richter and marius von hagen_4boys focus, 

STEP 3: Send to Training

STEP 4: Prompting

LEVEL 3: BUNDLES OF CONVENIENCE

Here is where I asked, “can I have a minicheckpoint???” 

Answer, yes. But single LoRAs are better to fully encapsulate your vision. Chances with concepts bleeding with bundles are high. Which is why these bundles are best when the intended generations are subconcepts of the overall concept the LoRA is envisioned for.



CHARACTER BUNDLE:

GOAL: 69 images per character

STEP 1: Gather the Characters!

STEP 2: Curate as if you are curating a normal dataset.

STEP 3:Tag as if you are tagging a normal character. YES, Do not let a single biological tag be present on two characters otherwise the bundle will be a mess

STEP 4: Send to Training

STEP 5: Prompting




BACKGROUNDS BUNDLE:

GOAL: 45 images per Background

STEP 1: Gather the Backgrounds!

STEP 2: Curate as if you are curating a normal dataset.

STEP 3:Tag as if you are tagging a normal dataset.

STEP 4: Send to Training

STEP 5: Prompting




COSTUME/CLOTHING BUNDLE:

GOAL: 30 images per Background

STEP 1: Gather the Backgrounds!

STEP 2: Curate as if you are curating a normal dataset.

STEP 3:Tag as if you are tagging a normal dataset.

STEP 4: Send to Training

STEP 5: Prompting




CONCEPT BUNDLE:

GOAL: 25 images per Background

STEP 1: Gather the Backgrounds!

STEP 2: Curate as if you are curating a normal dataset.

STEP 3:Tag as if you are tagging a normal dataset.

STEP 4: Send to Training

STEP 5: Prompting

LEVEL 4: TRAINING PARAMETERS

All Character Datasets

I aim for 1100-1200 total steps in 1 Epoch.

{

"engine": "kohya",

"unetLR": 0.0005,

"clipSkip": 2,

"loraType": "lora",

"keepTokens": 1,

"networkDim": 16,

"numRepeats": THIS WILL DIFFER, THIS WILL DIFFER, THIS WILL DIFFER,

"resolution": 1024,

"lrScheduler": "cosine_with_restarts",

"minSnrGamma": 5,

"noiseOffset": 0.1,

"targetSteps": 1120,

"enableBucket": true,

"networkAlpha": 8,

"optimizerType": "Adafactor",

"textEncoderLR": 0.00005,

"maxTrainEpochs": 1,

"shuffleCaption": true,

"trainBatchSize": 3,

"flipAugmentation": false,

"lrSchedulerNumCycles": 3

}



All Concepts / Backgrounds / Style / Assets / Bundles

{

  "engine": "kohya",

  "unetLR": 0.0005,

  "clipSkip": 2,

  "loraType": "lora",

  "keepTokens": 1,

  "networkDim": 8,

  "numRepeats":  THIS WILL DIFFER, THIS WILL DIFFER, THIS WILL DIFFER,

  "resolution": 1024,

  "lrScheduler": "cosine_with_restarts",

  "minSnrGamma": 5,

  "noiseOffset": 0.1,

  "targetSteps": 1105,

  "enableBucket": true,

  "networkAlpha": 8,

  "optimizerType": "Adafactor",

  "textEncoderLR": 0.00005,

  "maxTrainEpochs": 1,

  "shuffleCaption": true,

  "trainBatchSize": 3,

  "flipAugmentation": false,

  "lrSchedulerNumCycles": 3

}

LEVEL 5: LoRA Flexibility + Prompting Skill Check

The LoRA has finished training and so you wanna test it immediately. Use checkpoints that you feel does not have an overwhelming bias over something. Often times, the potential of a LoRA cannot be realized because of inadequate quality prompting. See prompting guides cause my only advice is to keep the prompt simple.


NOW TO LORA CHECKING:


Can it do what you envisioned?

If yes, Yayy! You’ve got yourself a catering LoRA. 


But how flexible is it?

  • If character is able to do a pose it is not trained on, Yayy

  • If background is able to add something doing something, Yayy

  • If clothing / assets can be used, broken, or turned into monster, Yayy

  • If concept / style can cater to aliens with the most morbid and gore features, Yayy


That being Yayy, can other LoRAs stack with it? NOTE: this test also depends on the other LoRAs’ flexibility so take it slow.

  • You want to check if your character LoRA can be paired with another character LoRA + a fighting concept LoRA. If Yayy then Congrats!!

  • You want to check if the background can be placed within a frame LoRA inception with a living room LoRA, if Yayy then SLAYYY!

  • You want to check if concept / style can merge with other LoRA concepts / styles, if Yayy then Hooray!




57

Comments