Type | Other |
Stats | 480 |
Reviews | (183) |
Published | May 16, 2023 |
Base Model | |
Training | Steps: 1 Epochs: 1 |
Hash | AutoV2 D2B356239D |
Hello friends!
Lots of questions about how to train a grid Lora, so I'll make a quick guide to my workflow. First, let me say, that we are (with @Bartuba) making an app, that will be a huge help for all of you, who want to create a Lora (actualy any kind of Lora, not just Grids). I'll post the link here as soon as we are ready to share the MVP and prepare larger tutorial which will be accumulating all of the experience I have at that moment.
Overview
So to train a Lora with grids you need to do this:
Gather a consistent dataset with decent variety of your concept;
Prepare images in your dataset for the training;
Tag your dataset;
Train a model;
Check the results and repeat the whole process if needed.
Gathering data
First of all you need a bunch of clips that represent your concept. I suggest to find a good quality clips (with resolution at least 1280x720). Gifs from danbooru are ok - they are mostly good quality, but I suggest not using them, because gifs most of the time have low amount of colors. Which is not good for training.
How to do that?
So if we a talking about grid Lora you need a bunch of clips. Well actually we need a frames from that clips, and there are lots of ways to get them. Here are some of them:
Pick frames manually when watching clips frame by frame (for example KMPlayer can do that, press "F"
to pay respectto see the next frame, and press "Ctrl + A" to save it on your computer). That's a long and boring process, but that way you can guarantee that your images will have a decent quality and variability;Split the whole clip (if it's 10sec short or so) to frames and pick the ones you like. For example you can do that on that site. Just put your gif/webm here and split it to frames, then download all of them as zip;
And another way is to download ffmpeg (here is the guide how to install in on windows), then just stack all of your clips in the same folder, "windows+r", type "cmd", then cd {path ro your folder}, and then use my script:
for %i in (*.webm) do ffmpeg -ss 0.5 -i "%i" -fps_mode vfr -frame_pts true -vframes 9 "%iout-%02d.png
Where:
(*.webm) is extension of the files;
-ss 0.5 is a period in seconds when script will take a frame from the clip;
-vfframes 9 is amount of frames will be taken from every clip;
Most of the time it gives pretty decent results.
*Our UI will help with that part in the future.
What frames should I take?
Well that's the first tricky part. For example you want to create a frid Lora with smiling. So pay attention to the following:
Your frames in dataset will be small. Is an object or concept clearly visible? Is it large enough (at least 30% of the frame)?
Are your clips consisntent by a quick look? Like if that a smile is it taken from the almost same angle? Too much variety may lead to a mess.
Is dynamic of the "smile" are clearly visible in differencies on frames? If the variety is too little - you may get 4(or 9) identical images.
Can you cut to the square without loosing the crucial details of the concept?
It doesn't have a logos on the crucial parts of the image?
If the anwer on any of the question is NO - think twice about that clip. Some of them can be fixed in photoshop, but it's time consuming.
Preparing data
Ok now we have a dataset of the raw images. We need to do the following:
Clean them from bad parts (like logos);
Set the order;
Cut it in squares;
Resize them to 512x512 (or to 256x256);
Merge them to the grid;
Resize the grids.
Cleaning and cutting
I use photoshop for that. Boring stuff. Nothing to say here)
The only one tip - stack the frames from one clips together and cut them all at once. Then use export layers to files. Saves time.
Frames should be in the same logical order for all of the grids. Like "smile" should get wider every frame for all of the clips. If it's vise versa - reverse it. Or you'll get the mess in order of the final results.
*If you are lazy to do so, you can overcook your model. Overtraining kinda helps - because the final result will stick to one image from the dataset. It will criple the results though.
*Our UI will help with that part in the future.
Resizing
I think everybody already knows it, but Birme is a good tool for that.
Merging
Again there a lot of ways to do so. Here are some:
Use photoshop to do it manualy. Just create an empty canvas with 1024x1024 (for 2x2frames) or 1536x1536 (for 3x3frames) and put images one by one on that canvas. Then merge the layers with frames from one clip and again export the layer as a separate files.
Or use python sript (made by @Bartuba) from the attachment:
Put it to the folder with all of your prepared frames (they should have names with letters, and resolution 512x512);
Call cmd;
Type "python 0sq.py png 3x3" (where 0sq is the name of the sript, png is extension of the frames, 3x3 is type of the grid).
Resize grids to the 512x512 for 2x2frames, and 768x768 for 3x3frames.
That's it for now, folks :)
I made this guide to share my tools and make it a little bit easier for those who want to try to train grid models too) In the next steps I am not sure yet, so I leave it like this for now. I'll come back after our UI is ready.
Have a nice day)
Tagging
TBD
Training
TBD
Checking the results and analizing
TBD
Feel free to ask any questions here in comments, or in my discord channel.
Also I am making games with AI arts. They will be free in the future, but if you want to participate in making or have an early access - you can support me on patreon)
Also my wet hair Lora was banned here, I dunno why. You can download the last version of it from my patreon for free.