You only need images and a kindergarten degree in vocabulary to create a LoRA! Very simple to approach, but very tedious to construct. So here’s a simple article to help you build your first, or if not your first, add more insight to your understanding in creating a LoRA.
DISCLAIMER: I LUV CHARACTERS GOING OUT THEIR BIZNESS AND SHIPS SAILING PASS TITANIC
WE BEGIN:
“A good LoRA has a vision for a subject among a consistently diverse set of images. Its tags are non-duplicating, non-conflicting, and free from false positives. Thus solidifying a dataset capable of being trained across all Generative Image Ecosystems”
Are you confused?
If yes, consider reading from Level 0.
Are you partially confused and only need to affirm something?
Skip level 0.
LEVEL 0:
You can create a LoRA for almost anything. For any subject with images and can be described in simple words.
An unknown side character from your favorite scenic movie? Characters, Backgrounds and assets are beginner friendly
Have a concept art you want replicated in seconds to see if it is any good in scenes? Concepts and style are rather tricky
…I did not just rhyme them. Besides, why do you want to make a LoRA? For me, I love stories and I am a visual learner. So it is really therapeutic to generate full blown high quality-storyboard/book cover esque images, and entertaining when I want to see my favorite characters and ships (couples/groups) doing things outside their intended shows/business.
For whatever reason, you want to build a LoRA. A LoRA needs…
A Vision
A Subject
A Dataset
VISION is your imagination. What will your output images look like?
Will it add flavor?
Will it resemble likeness?
Will it manipulate elements?
SUBJECT can either be
Character = A person, animated character, animal, or monster
Background = A landmark or scenic locations,
Assets / Objects = Clothing, weapons, furniture, misc.
Concept = General Notion / something any character can do
Style = Themes, and general vibe to be layered on any of the above subjects
DATASET is your cooking pan. You COOK here. So when you cook, you have to be careful with whatever subject you are handling! Beginner friendly ones are just eggs and hotdogs, the tricky ones got that lingering feeling, odor, texture, that needs to be looked out for! Otherwise your dish would be a fail for the judge–the LoRA Trainer.
In short, a dataset is a collection of images with text files.
Image - It contains visual elements that can be described
Text file - It corresponds / identifies to an image, containing tags describing the visual elements to anchor for training
Where you source the images is entirely up to you but for tags, always remember to keep them simple!
PLUS ANOTHER NOTE B4 You continue, I have my own style of approaching LoRA creation. I focus more on the dataset rather than the training parameters.
LEVEL 1: Perfectly Normal Datasets
A perfectly normal dataset has balance. Its images have a constant theme/focus element across varying contexts. For example, A goblin man with a third eye is your constant focus element. In his dataset, he is present across all images that show him in various angles, actions, and interactions. Its text files contain simple words describing the changing elements.
Now, these questions may or may not have already jabbed your brain curlies
How many Images in total?
How many Images of this and that?
What kind of images?
How should I tag / caption?
What checkpoint to use?
What Base model?
What training parameters?
VISION #1 = I WANNA SEE THE CHARACTER/s in THEIR ORIGINAL LIKENESS!
STEP 1: Find images showing the character.
They are the constant focus element of images in various actions / interactions
Mr. Third eyed goblin is speaking, or talking to another, or eating…
They are either a blur or in low quality
Mr. Third eyed goblin remains the main focus of an absurdly low resolution/quality image
STEP 2: Identify total image count
If # of images is less than or not equal to 25, refer to LIMITED dataset in Level 2
If # of images is more than or equal to 25, refer to NORMAL dataset here in Level 1
STEP 3: Image Curation (Takes up 80% of your LoRA creation time unless you have a scraper)
LoRAs for more than 2 characters, see Level 3
Vocabulary:
Clone - An identical Image
Altered - An altered image of the original
Solo NORMAL CHARACTER DATASET
When curating one LoRA character, I aim for 50-90 total images. These images are either purely solo (portrait, doing something, posing, etc.), or with other characters/subjects. (interacting, side-by-side, comic, etc.). Most of the images also show clear facial features especially on full body shots.
Has a front, back, side, and foreshortened views.(Minimum # for each: 1)
Dutch angle and foreshortening are welcome
Has an eye, and hair focus (Minimum # for each: 1)
Has images for both lateral sides of the body and face
Shows the face
Shows the face + body
Other special details, tattoos, scars, etc.(5% of the total image count)
In focus, Clone at least twice for each special detail
With the face/body, Clone at least thrice
If Clone images are more than 3% of the total image count, duplicate the best images showing your character and alter them.
Altered images is on a different aspect ratio
Altered images should have enhanced colors or overlaid by a color filter
Altered image is on a different quality from the original
Altered image is on a different style from the original
Couple NORMAL CHARACTER DATASET
When curating for a couple/duo LoRA, I aim for 50-90 total images. These images are either collaged (both solo images of the characters are saved as 1 image next to each other) or both characters are in 1 image
Shows the face of both
Shows the face + body of both
Shows front-to-front, side-by-side, front-to-side, back-to-back, back-to-front, back-to-side (Minimum # for each: 1)
Dutch angle and foreshortening are welcome
Does not focus on one character’s features
Both characters are interacting with each other
Both characters are not interacting with each other but still in one image
Clone 3 images to solidify body / facial details
STEP 4: Tag them Up! (Takes up 20% or less of your LoRA creation time especially with an autotagger)
Solo NORMAL CHARACTER DATASET
Tagging style differs across LoRA creators. For me:
I have two tags present in all images, (Core tag, anime screencap)
Core tag = (Name + [different colored Skin tone here] what they are [with if there is any] focus). This also works with captions.
Anime Screencap = For switching on and of the average style in the dataset
Send your images to an autotagger (CIVIT’s Trainer has one)
Remove all biological related tags (skin color, torso features, leg features) except for the genitals.
What I generally remove:
male focus, 1boy, muscular, muscular male, toned, toned male, bara, nipples, no nipples, pectorals, large pectorals, breasts, large breasts, medium breasts, small breasts, abs, navel, collarbone, thighs, thick thighs, biceps, arms, thick arms, mature male, manly, stomach, multiple boys, yaoi, hetero, couple, dark skin, dark-skinned male, ass, multiple penises, parody, style parody, official style, fake screenshot, virtual youtuber, anime coloring, no humans, horns, digimon (creature), monster, colored skin, duel monster, pokemon (creature), pointy ears, thick eyebrows, fat man, shota, child, loli, old, old man, father and son, father and daughter, rape, elf, orc, robot, mecha, centaur, fish boy, monster boy, devil, devil boy, devil girl, mermaid, merman, animification, scene reference, cosplay, fantasy, science fiction, horror (theme), furry male, fine art parody, pale skin, multiple girls, multiple others, yuri, cyborg, dark-skinned female, mother and daughter, daughter, father and daughter, father and son, mother and son, incest, husband and wife, mature female, mature female, height difference, size difference, age difference, siblings, sideburns, long sideburns, sanpaku, fujimaru ritsuka (male), satou Kazuma, natsuki Subaru, Uzumaki Naruto, Uchiha sasuke, kaito (vocaloid), hair between eyes, bangs, parted bangs, , archer (fate), undercut, monkey d. luffy, roronoa zoro, sanji (one piece), male child, super Saiyan, viktor nikiforov, producer (idolmaster), height difference, size difference, age difference, siblings, kyon, reiner braun, eren yeager, cyan skin, purple skin, multicolored skin, colored skin, antenna, GONNA ADD MORE IF THERE ARE CHARARCTER NAMES OR CONFLICTING TAGS I HAVE YET ADDED HERE
Inconsistent elements that are in less than 15 images should be tagged
Change/prune the character’s hair tags to one single unique tag: Hair color_character name_hair
NOTE: If you have images with other characters, remove their hair colors/styles
Change/prune the character’s eye tags to one single unique tag: eye color_character name_eyes
NOTE: If you have images with other characters, remove their eye color.
Add the anime screencap
Add “low quality, blurry, worst quality” on blurry/low quality images
Review all tags
Send to Training, see level 4
Couple NORMAL CHARACTER DATASET
Tagging style differs across LoRA creators. For me:
I have two tags present in all images, (Core tag, anime screencap)
Core tag = (Name and name + what couple focus). This also works with captions.
Anime Screencap = For switching on and of the average style in the dataset
Send your images to an autotagger (CIVIT’s Trainer has one)
Remove all biological related tags (skin color, torso features, leg features) except for the genitals.
What I generally remove:
male focus, 1boy, muscular, muscular male, toned, toned male, bara, nipples, no nipples, pectorals, large pectorals, breasts, large breasts, medium breasts, small breasts, abs, navel, collarbone, thighs, thick thighs, biceps, arms, thick arms, mature male, manly, stomach, multiple boys, yaoi, hetero, couple, dark skin, dark-skinned male, ass, multiple penises, parody, style parody, official style, fake screenshot, virtual youtuber, anime coloring, no humans, horns, digimon (creature), monster, colored skin, duel monster, pokemon (creature), pointy ears, thick eyebrows, fat man, shota, child, loli, old, old man, father and son, father and daughter, rape, elf, orc, robot, mecha, centaur, fish boy, monster boy, devil, devil boy, devil girl, mermaid, merman, animification, scene reference, cosplay, fantasy, science fiction, horror (theme), furry male, fine art parody, pale skin, multiple girls, multiple others, yuri, cyborg, dark-skinned female, mother and daughter, daughter, father and daughter, father and son, mother and son, incest, husband and wife, mature female, mature female, height difference, size difference, age difference, siblings, sideburns, long sideburns, sanpaku, fujimaru ritsuka (male), satou Kazuma, natsuki Subaru, Uzumaki Naruto, Uchiha sasuke, kaito (vocaloid), hair between eyes, bangs, parted bangs, , archer (fate), undercut, monkey d. luffy, roronoa zoro, sanji (one piece), male child, super Saiyan, viktor nikiforov, producer (idolmaster), height difference, size difference, age difference, siblings, kyon, reiner braun, eren yeager, cyan skin, purple skin, multicolored skin, colored skin, antenna, 1girl, 2boys, 2girls, GONNA ADD MORE IF THERE ARE CHARARCTER NAMES OR CONFLICTING TAGS I HAVE YET ADDED HERE
Inconsistent elements that are in less than 15 images should be tagged
Change/prune the character’s hair tags to one single unique tag: Hair color_character name_hair
NOTE: If you have images with other characters, remove their hair colors/styles
Change/prune the character’s eye tags to one single unique tag: eye color_character name_eyes
NOTE: If you have images with other characters, remove their eye color.
Add the anime screencap
Add “low quality, blurry, worst quality” on blurry/low quality images
Review all tags
Send to Training, see level 4
VISION #2 = I WANNA BACKGROUND / ASSETS FOR CHARACTERS!
STEP 1: Find images of your chosen background / asset (object)
They are the constant focus element of images in various actions / interactions
They are either a blur or in low quality
STEP 2: Identify total image count
If # of images is less than or not equal to 15, refer to LIMITED dataset in Level 2
If # of images is more than or equal to 15, refer to NORMAL dataset here in Level 1
STEP 3: Image Curation (Takes up 80% of your LoRA creation time unless you have a scraper)
LoRAs for more than 2 characters, see Level 3
BACKGROUND DATASET
If indoors focus on indoors, if outdoors focus on outdoors. If wanting both, proceed to Level 3
Shows different natural lighting (day, afternoon, night)
Shows environmental interaction (raining, snowing, fire, thunder, etc.)
Shows characters / animals (not in focus) together with the background
Shows different viewing angles / perspectives
Include cropped sections of the background
ASSET DATASET
Shows different natural lighting (day, afternoon, night)
Shows environmental interaction (raining, snowing, fire, thunder, etc.)
Shows characters / animals together with the asset
Shows different viewing angles / perspectives
Shows the asset in different background
Include cropped sections of the asset
STEP 4: Tag them Up! (Takes up 20% or less of your LoRA creation time especially with an autotagger)
Tagging style differs across LoRA creators. For me:
I have two tags present in all images, (Core tag, anime screencap)
Core tag = (Name + focus). This also works with captions.
Anime Screencap = For switching on and of the average style in the dataset
BACKGROUND DATASET
If indoors focus on indoors, if outdoors focus on outdoors. If wanting both, proceed to Level 3
Send your images to an autotagger (CIVIT’s Trainer has one)
Remove all tags related to your BG
Ex. A Cityscape BG - remove all building tags, and sky
Ex. A living room - remove all furnitures,
Inconsistent elements that are in less than 15 images should be tagged
Add the core tag
Ex. A cyberpunk cityscape BG - cyberpunk scenery-cityscape focus
Ex. A goth indoor living room - goth interior scenery - living room focus,
Add the anime screencap
Add “low quality, blurry, worst quality” on blurry/low quality images
Review all tags
Send to Training, see level 4
ASSET DATASET
Send your images to an autotagger (CIVIT’s Trainer has one)
Remove all tags related to your BG
Ex. A wooden chair - remove all characteristics of the wooden chair (wood, chair, etc.)
Inconsistent elements that are in less than 15 images should be tagged
Add the core tag
Ex. A chair with a skull - chair with skull (object) focus
Ex. A bed with teeth - bed with teeth (object) focus
Add the anime screencap
Add “low quality, blurry, worst quality” on blurry/low quality images
Review all tags
Send to Training, see level 4
VISION #3 = I WANNA CONCEPT / STYLE FOR EVERYTHING!
STEP 1: Find images showing the concept / style.
Has the general vibe of the concept / style
They are either a blur or in low quality
STEP 2: Identify total image count
If # of images is less than or not equal to 35, refer to LIMITED dataset in Level 2
If # of images is more than or equal to 35, refer to NORMAL dataset here in Level 1
STEP 3: Image Curation (Takes up 80% of your LoRA creation time unless you have a scraper)
LoRAs for more than 2 characters, see Level 3
CONCEPT DATASET
Shows the concept with characters (1girl, 1boy, 2girls, 2boys, 1other, animals, monsters, etc.)
BODY CONCEPTS
Shows the concept in different angles / perspectives
Shows the concept being interacted by the character / environment
Ex. Sharingan eye looking at wall
Ex. mutated hand grabbing things
Shows 5% of the total image focusing on the body concept
ACTION / POSE CONCEPTS
Shows the concept in different angles / perspectives
Shows the concept being interacted by the character / environment
Ex. Unique standing pose at Mt. Everest peak looking at wall
Ex. Unique couple position while it is raining
Shows a black/white silhouette of the concept
STYLE DATASET
Any images with the general vibe of the style are okay!
STEP 4: Tag them Up! (Takes up 20% or less of your LoRA creation time especially with an autotagger)
Tagging style differs across LoRA creators. For me:
I have two tags present in all images, (Core tag, anime screencap)
Core tag = (Name + focus). This also works with captions.
Anime Screencap = For switching on and of the average style in the dataset
CONCEPT DATASET
Shows the concept with characters (1girl, 1boy, 2girls, 2boys, 1other, animals, monsters, etc.)
Send your images to an autotagger (CIVIT’s Trainer has one)
Remove all tags related to your concept
Ex. NSFW focused concept - remove all body parts around the body part your NSFW LoRA will be focused on
EX. sharingan / mutated hand - remove all eye and arm tags
Inconsistent elements that are in less than 15 images should be tagged
Add the core tag
Ex. NSFW focused concept - NSFW action focus
Ex. Sharingan / mutated hand - 1other with sharingan and mutated hand focus
Add the anime screencap
Add “low quality, blurry, worst quality” on blurry/low quality images
Review all tags
Send to Training, see level 4
STYLE DATASET
Send your images to an autotagger (CIVIT’s Trainer has one)
TAG EVERYTHING or NOT. The latter will need more epochs
Add the anime screencap
Add “low quality, blurry, worst quality” on blurry/low quality images
Review all tags
Send to Training, see level 4
LEVEL 2: LIMITED DATASETS
Where the sources are nonexistent, so I rely now on concept bleeding and NSFW bias.
SINGLE IMAGE LoRA
Do you only have one image?
I also made my OC’s LoRA from 1 image. But it did not take a single version! There were three! This one is the third and final version: https://civitai.com/models/1464742
I only know one approach to work with single image training–LET THE NSFW BLEED!--this also works with concepts, backgrounds, and assets.
1st version:
15-20 total images is the goal!
STEP 1: Finding that One Image
STEP 2: Crop that single image for every element you want the trainer to learn.
For character - I crop the head, then to the eyes, nose distance, forehead distance, hair, etc. then the body, etc. Also include cropping the character’s lateral and transverse anatomy.
For body concept - I mainly focus on that body concept and training it as if it is an asset
For action concept - The body parts involved with the action are cropped
For Backgrounds - Cut it like its pizza
For Assets - Crop it laterally and transverse like you are cutting a mannequin
STEP 3: Cloning for Altered Images
Clone every single cropped images made
Edit the orientation, aspect ratio, and angle
Ex. a horizontally cropped lateral image of a bunny → edited the angle, orientation, and aspect ratio
Add a color overlay filter, GRAYSCALE IS A MUST!
OPTIONAL STEP: Add other related images from wherever source.
With Daberry’s first version, I generated a blue PP and added it for his first version. Allowing NSFW to bleed.
STEP 4: Tag the elements like a normal dataset, this time, manually tagging is the best approach.
If the images have a constant background, add “crossover”
STEP 5: Proceed to Level 4 for Training
STEP 6: Proceed to Level 5 for Prompting
2nd version
25-35 images is the goal!
STEP 1: GENERATE LOTS OF NSFW IMAGES from Level 5
If its background, clothing, or assets, try to add some skin. LET THE NSFW BLEED IN
STEP 2: Tag them up with the same core tag and unique tags
STEP 3: Send to Training
FINALSAY: If you can generate with the LoRA on 0.8 strength, no need to proceed with a 3rd version unless you wanna.
3+ Multiple people in one image LoRA
The trickiest in ai generation is specifying multiple subjects. So this dataset is much trickier than it is already tricky. Luckily, I can confirm how collaging multiple solos together can work.
GOAL: 30-70 images
STEP 1: Find as many images of the characters together. If not, get their solos and create the collages.
STEP 2: Tag in the same way normal couple LoRAs are tagged.
The core tag having the names of the characters + (input here number and gender) focus
Ex: luke pearce and artem wing and vyn richter and marius von hagen_4boys focus,
STEP 3: Send to Training
STEP 4: Prompting
LEVEL 3: BUNDLES OF CONVENIENCE
Here is where I asked, “can I have a minicheckpoint???”
Answer, yes. But single LoRAs are better to fully encapsulate your vision. Chances with concepts bleeding with bundles are high. Which is why these bundles are best when the intended generations are subconcepts of the overall concept the LoRA is envisioned for.
CHARACTER BUNDLE:
GOAL: 69 images per character
STEP 1: Gather the Characters!
STEP 2: Curate as if you are curating a normal dataset.
STEP 3:Tag as if you are tagging a normal character. YES, Do not let a single biological tag be present on two characters otherwise the bundle will be a mess
STEP 4: Send to Training
STEP 5: Prompting
BACKGROUNDS BUNDLE:
GOAL: 45 images per Background
STEP 1: Gather the Backgrounds!
STEP 2: Curate as if you are curating a normal dataset.
STEP 3:Tag as if you are tagging a normal dataset.
STEP 4: Send to Training
STEP 5: Prompting
COSTUME/CLOTHING BUNDLE:
GOAL: 30 images per Background
STEP 1: Gather the Backgrounds!
STEP 2: Curate as if you are curating a normal dataset.
STEP 3:Tag as if you are tagging a normal dataset.
STEP 4: Send to Training
STEP 5: Prompting
CONCEPT BUNDLE:
GOAL: 25 images per Background
STEP 1: Gather the Backgrounds!
STEP 2: Curate as if you are curating a normal dataset.
STEP 3:Tag as if you are tagging a normal dataset.
STEP 4: Send to Training
STEP 5: Prompting
LEVEL 4: TRAINING PARAMETERS
All Character Datasets
I aim for 1100-1200 total steps in 1 Epoch.
{
"engine": "kohya",
"unetLR": 0.0005,
"clipSkip": 2,
"loraType": "lora",
"keepTokens": 1,
"networkDim": 16,
"numRepeats": THIS WILL DIFFER, THIS WILL DIFFER, THIS WILL DIFFER,
"resolution": 1024,
"lrScheduler": "cosine_with_restarts",
"minSnrGamma": 5,
"noiseOffset": 0.1,
"targetSteps": 1120,
"enableBucket": true,
"networkAlpha": 8,
"optimizerType": "Adafactor",
"textEncoderLR": 0.00005,
"maxTrainEpochs": 1,
"shuffleCaption": true,
"trainBatchSize": 3,
"flipAugmentation": false,
"lrSchedulerNumCycles": 3
}
All Concepts / Backgrounds / Style / Assets / Bundles
{
"engine": "kohya",
"unetLR": 0.0005,
"clipSkip": 2,
"loraType": "lora",
"keepTokens": 1,
"networkDim": 8,
"numRepeats": THIS WILL DIFFER, THIS WILL DIFFER, THIS WILL DIFFER,
"resolution": 1024,
"lrScheduler": "cosine_with_restarts",
"minSnrGamma": 5,
"noiseOffset": 0.1,
"targetSteps": 1105,
"enableBucket": true,
"networkAlpha": 8,
"optimizerType": "Adafactor",
"textEncoderLR": 0.00005,
"maxTrainEpochs": 1,
"shuffleCaption": true,
"trainBatchSize": 3,
"flipAugmentation": false,
"lrSchedulerNumCycles": 3
}
LEVEL 5: LoRA Flexibility + Prompting Skill Check
The LoRA has finished training and so you wanna test it immediately. Use checkpoints that you feel does not have an overwhelming bias over something. Often times, the potential of a LoRA cannot be realized because of inadequate quality prompting. See prompting guides cause my only advice is to keep the prompt simple.
NOW TO LORA CHECKING:
Can it do what you envisioned?
If yes, Yayy! You’ve got yourself a catering LoRA.
But how flexible is it?
If character is able to do a pose it is not trained on, Yayy
If background is able to add something doing something, Yayy
If clothing / assets can be used, broken, or turned into monster, Yayy
If concept / style can cater to aliens with the most morbid and gore features, Yayy
That being Yayy, can other LoRAs stack with it? NOTE: this test also depends on the other LoRAs’ flexibility so take it slow.
You want to check if your character LoRA can be paired with another character LoRA + a fighting concept LoRA. If Yayy then Congrats!!
You want to check if the background can be placed within a frame LoRA inception with a living room LoRA, if Yayy then SLAYYY!
You want to check if concept / style can merge with other LoRA concepts / styles, if Yayy then Hooray!