Guide(ish) for training a style+multiple characters in a single LORA: my tips and thoughts for creators on improving the content on this website
This is simply going to be some tips for both people who want to start training and fine tuning or for people who already have. but just a disclaimer that I'm by no means an expert, and i have no clue how all the coding and technical stuff works, I just use them and i have had some good results, although i didn't share everything i made yet.
First, the elephant in the room, most people don't want a 2-6gb checkpoint for a very specific concept, for me, checkpoints should be for general use models. so unless you have a very specific reason why, don't train checkpoint models for anything specific since there are better and more portable alternative, which brings me to my next point.
I don't think single character LORAs are worth it either, for my most recent LORA (HxH LORA) i trained both a style and at least 4 (possibly one or two more) characters in a single LORA that's 144mb, and it wasn't even hard. For all my LORAs i use Kohya's Linaqruf colab. its super easy and relatively fast. Personally i caption my datasets using Dataset Tag Editor extension on Auto1111, i simply let DeepDanbooru interrogator do it's job, then i clean up the captions and edit them a bit, not waay too much, and simply added character names to the caption for each picture of a the character in the dataset. Took the captions and dataset to the colab linked above, pretty much mostly default settings, for the epochs honestly I'm not sure, experiment and see how it goes for you 20-50 range is probably good.
and that's it, instead of sharing 2-4gb checkpoint for a style, or 4 LORAs 144mb each for single characters, now you can do all of that in a single LORA. Obviously training charcaters +style in this case worked well with just captioning since the characters are from the same series in the same style, i don't know if it works with different characters with different styles (although i think you can train different concepts at the same time in the colabs so that's another solution for that case).
on a similar note, have a look at luisap method, I tried it a couple of times for styles only and it worked well, and with small datasets (8 pics and 10 pics). you'll get 1mb LORAs that work well, only thing i would add, i added the text encoder (5e-5 lr), and the results were much better for me, i had captioned my dataset so maybe that was the reason. I made the changes suggested by luisap + the text encoder into the colab and uploaded it if you're too lazy to do the changes yourself to the colab template here. (edit: i think i forgot to change lr_scheduler from constant to cosine with restart, so do that if you want to use this, and it's probaby best if you input the settings yourself from the colab found on github since its regularly being updated). just drop it into your google drive and run it. I only ran this method on 3 styles and worked well on all, but never tried it on characters so i don't know how that would work.
TI embeddings are great and have a very small file size, but unfortunately I've never managed to make them so i can't help there.
Also, kudos to @kappa_neuro and the people extracting LORAs from checkpoints.
edit: since many people are asking, if there are any volunteers to make a video guide, i would gladly help you over discord to make the video.
21 Answers
if you read this and agree, please comment/link this on posts that would benefit from this (such as very specific models).
I wish I could see a video about this process to understand better, but this post is greatly appreciated
Thanks. I hope I'll have time on the weekend to take a look to those methods. Definitely it would be better if it's possible to avoid 2-6 GBs files.
In my case, I am trying to do whatever works. It's fine to ask people to make smaller LoRA's but your instructions aren't step-by-step and will leave people wondering and going back to the experimental phase, when they could instead just ignore you and do what has worked for them before.
I'd love to have a clear, step-by-step breakdown of how to make a 1-6mb style LoRA that, a) avoids the problem of faces and scenes bleeding through into all of the images, b) addresses every setting instead of just specifying a few, c) actually makes it a more attractive option apart from filesize. Currently, this guide(ish) isn't that.
TL;DR of your guide currently is... 'I did some vague thing LuisaP vaguely said to do, and didn't specify important things that ought to be specified by a person purporting to want to help people make their own LoRA's effectively... Yeah, then I kinda did some tagging thing, and epochs I dunno, and yeah go do it because you're cluttering this site and I don't like it.'
This is great for anime and styles, but I've yet to see a LORA trained on the likeness of a real-life subject that even remotely rivals Dreambooth in terms of quality or flexibility.
Change my mind.
Hi! I just want to thank you, after some tries i was able to create my very first LORA successfully, thanks!
I don't think single character LORAs are worth it either
At dim/rank 128 I agree, but if you lower it to dim/rank ~16 the file size drops below 25MB and you still get nearly the same quality with character models. Non character models may differ at smaller size though.
I fully agree that it's crazy to train a 3-7GB model for a single character. Or even a few characters. As you noted, models should be for overall style and quality. E.g. a model that's trained specifically on landscapes. Or a model that's specifically trained on characters. But to get a likeness. That seems to be the realm of Textual Inversion. I trained my first one this week. It wasn't hard. I watched this tutorial and now feel fully comfortable training TIs. With their small file size, it seems to me TIs are better suited for character likeness than LORAs.
Adding Details to Training Style.
The more images there are the better it is. While Creating LORA of some of artstyle, you can look at my profile. I have used 400+ images on each of them. and on the Yakitomato Model and Imaizumi model, around 700+, 1200+ respectively, 32 DIM 16 Alpha, Cosine with restart, 2 repeats, 4 batch size, 3-4 epoch. yet they managed to replicate the style perfectly. I mean I have been able to generate exact same character as the artstyle. moreover with 32DIM, I noticed that when I trained on Literally manga images filled with lots of text bubbles and lots of characters, there were tons of caption defining each element. and when I used those elements in negative prompt I didn't get those elements. and infact, I didn't get any text bubbles at all. You would be amazed to know that yakitomato model was trained on 700+ manga images only. can't share because of copyright, I just ripped all his manga and trained.
Thus when training a style go for lots of Images.
you can try it yourself. I used NAI full final pruned for training them and wd1.4 tagger.
currently studying lora training right now, i can volunteer to do a video about this but ofc I still need to figure things out. I already have some experience with colab training before as well as runpod training using Everydream and Stabletuner. Already made a few videos on SD on my channel. Hope you can contact and help me in discord! Rexel#6689
I don't think single character LORAs are worth it either
If you want more flexibility and accuracy, training a single character LoRA tend to be better and easier to train without overfitting than multiple character ones.
Style + character LoRA are great if you want to drawn those character with that style....but are bad if you just want to use those character (use another style) and you can hardly drawn a character who share some traits (hair/eye colors) with one of the character that LoRA was trained to replicate. (Text encoder is a double edged sword and this is why I tend to disable it when I train a style LoRA)
But I agree that single character dreambooth model are a stupid idea and using +128 dim to train a single character LoRA is overkill. (8-16 dim seem to be sweet spot)
To add to @tsu's comment, this site also incentivizes individual loras per character due to only having one preview image on the front page.
Please point me to how to create lauras on my phone! I have so many I want to make but am away from my pc most of the day and remote desktop sux... please help!!!🙏🙏🙏
As someone who is beginning to review a lot of Loras on Youtube (and has made a few finetunes) here are a few things I've noticed:
- backgrounds are often not captioned well causing Loras to be inflexible or requiring their strength to be dropped so far they don't work right if you want to change the background.
- tags are often spelled poorly in metadata
- many Loras are badly named with the filename not representing the Loras (i gather civit changes them but there must be a method to it)
- captions often mention color where they shouldn't (for example if an orc is green, don't describe it as green unless you want the person using the Lora to be able to change that, it'll just make it more likely for the color to be wrong in general use)
- Loras are often trained on specialist models, this is fine if you want a personal Lora that will do the one thing you need... if you want to make a more general Lora train on SD1.5 or Anythingv3 (or another basic anime model) not chillout, or counterfeit, or whatever.
- Hard to tell with this one, but train without a VAE unless you REALLY want to force a particular color (less relevant for style loras)
Still don't fully get how to train multi character lora from the posts above.
If i understand it right, i just have to add a second folder in kohya for training with Steps_Name as Folder Name and add the images in there with the prompt/txt files and then just train the lora as usual?
For example kohya:
image
20_Character1
images
txt
17_Character2
images
txt
log
model
Now 5 months after the original post here is my opinion:
First the one point I 100% agree with. We have far too many checkpoints that could be replaced with a (slightly larger ~100MB) model. It is already too late to go back from the 50 or so established ones but hopefully we can prevent people from creating more and more checkpoints and merges for ever so slight stylistic differences.
Regarding the multi-content LoRA:
I think single character LoRA are still the way to go. You can train a really good model into less than 10MB so you could have hundreds saved and barely notice.
As has been stated before: the more specific the LoRA, the easier it is to make it flexible and prevent over/undertraining.
Then purely from the Civit user standpoint: You either search for specific characters or see an example of a character you recognize. The first will only work with proper tagging and concise names, which we really can't expect of people. The second would require example images that show all included characters. That might be possible with 3-4 in a slideshow - not exactly an optimal solution - but imagine 10 or more in one model.
Finally my strong opinion on characters and their style: Character LoRA should rarely, if ever, be stylistically impactful. If I can't throw it into different base models or combine it with other style LoRA, what is the point?
Landing on this entire questions piece a little late, but I posted a tutorial on training LoRAs that goes into setting up multicharacter LoRAs. Like some of the other answers note - it's not ideal, but something fun to play with. Put 5 different celebs into this LoRA: https://civitai.com/models/71568/multi-sharma where the only connecting characteristic between the persons is their last name (and of course that they are Bollywood actresses).
Am still experimenting with Lycoris/LoCon so I'm going ot see if I can fit multiple characters into a file that's just 20MB.
I made a lora recently by using dreambooth for a single person (roughly 30 images) but didn't check any recent guides for it, I managed to make a lora but it didn't work without its model and I don't know why? Does anybody know the reason why might that have happened?