Tricks about DATA prep [Character] - by a noob for all noobz

Made by a noob for all noobz

Because generating tons of pictures can be boring it's time to create some LoRA it's Wifuuu time ! The first thing you need to know is that creating a LoRA character is easy but take time. Don't be scared about the process because having the best data will compensate your shitty training skill. This article will give you some of the best tricks about the main data.

1. [SELECT YOUR WAIFU]

2. [DATA SET RESSOURCES]

3. [DATA SET CAPTION]

4. [DATA SET SELECTION]

5. [DATA SET CREATION]

6. [DATA SET CLEANING]

7. [DATA SET CROPING]

8. [DATA SET TAGGING]

9. [V1 TO V2 UPDATE]

10. [UPGRADE MODEL]

1. [SELECT YOUR WAIFU]

Because it take lot of time for making alive your waifu you should check if there is not already a good model of it. After you find you beloved waifu contact the publisher or the studio and told them i love your character blablabla... i'm gonna make a LoRA of it. Like this it will help you in the futur... If they don't respond it's consider as a yes :)

Now determine the unique style you want (Anime / 3D / Manga / Comics...).

Choose the main outfit of the character : i prefer train only on the main outfit because it take less time and i don't care about secondary outfits but sometime characters have evolution like a pokemon so you need to train with multiple outfits. If you want best results do not add secondary outfits or do a separate LoRA of it.

2. [DATA SET RESSOURCES]

Find it / Buy it.

Take only the data in one style - do not mix styles. The only case you will differ from this is if the data set is soo low (generally NSFWaifus) that you need to apply necromancy and convert / modify some alternative data with AI tools (for example convert CG data into Anime style).

3. [DATA SET CAPTION]

ffmpeg is a free and open source software that can extract all frames from a video. You have just to create a script (.bat) and put the video inside and it will create a folder with all the frames inside. Ask AI how to make your perfect script(s) because if your video is at 4K native a 1080p capture script will be better than capturing all frames in 4K.

- Capturing in JPEG = low quality = tiny size folder

- Capturing in PNG = best quality = huge size folder

If you want the best of the best take all native but you need a huge SSD that will help you making the best stitches because better data = better stitches.

Do not take data from a video that have locked subtitles !

4. [DATA SET SELECTION]

tons of scrolling because you will have to select the best frames. At the same time find the stitches and put them in separate folders (1 stitch = 1 folder). Do not delete some empty background frames that can help you removing bad moving elements around the character.

5. [DATA SET CREATION]

Softwares that can stitche your frames together (vertical travling / horizontal travling / diagonaly travling) : - Image Composite Editor - AutoStitch - Overmix

Use travling shots to create best quality data. There is sometime issue connecting the lower frame to the higher because the colors doesn't match that why an AI tool can perhaps solve this huge issue by editing the colors

Capture frames on another girl(s) if she has the same clothes than your model (perhaps she has better data of your main character clothes)
Feets problem : sometime it's really hard to have a full body shot that why you need to use AI tools to adding feets on your character
use AI tools for completing missing parts (for example remove panel and complete missing part(s) of a character = bigger data = better crop) or increase the quality of a low quality frame

6. [DATA SET CLEANING]

use AI tools to remove : background (not if it is night or evening or if there is too much shadows on your character) / blur / bloom / speed lines / other unwanted characters / objects / effects...

7. [DATA SET CROPING]

put your character body (torso) in the middle of the frame even if you need to cut some stuff of her / him (sometime you can't)
crop all your pictures to max 1024p (1024x??? or ???x1024)

8. [DATA SET TAGGING]

prepare a list of all tags that describe perfectly your character
use AI tool for auto tagging (Civitai On-Site LoRA Trainer is amazing but you will have to correct bad or missing tag)
tag in good order (character features > clothing > poses and expressions)
tag the style

9. [V1 TO V2 UPDATE]

At this moment be the judge and use only the best pictures you generate that are 99% accurate to the original character to create a V2.

A V1 is here to complete the missing part of the training dataset for a perfect V2. Let me explain a NSFW character gonna miss lot of data and a SFW character gonna miss NSFW data. The V2 is needed for an ultimate version because you will have good SFW and NSFW (full liberty and creativity).

use AI image generation to complete the parts where the clothes don't have lot of data, generally it's the back of the character and the feets
YOU WILL BE ALSO THE JUDGE for the missing parts like underwear for a total control of the NSFW part soo try to find be best style / color(s) of these underwear :)
for the underwear find data of your character in other style (CG, GAME...) perhaps in these sources you will have the hints on how to make these missing underwears in anime style for example

10. [UPGRADE MODEL]

When something is better / easier than the current model or wait few years to see the limit of AI. Like this you can focus making more wifuuuuuuuuuuuuuuu.

i have made this article because i love Civitai and i see tons of bad models that do not respect the original work, if i found other tricks for having better data i will upgrade the article !!!!!!