Textual Inversion vs Hypernetworks vs LoRa vs Dreambooth: What is the best method for training SD?
If I understand correctly, then if we want to train the SD model based on the face of a specific person, it is best to use textual inversion or LORA? And if we want to train SD for a specific style or complex abstractions, then it is better to use hypernetworks? As far as I understood, Dreambooth should be used to train your own complex models, where the number of photos to train will be much more than 20 or 30. For this reason, LORA is better suited for SD training on a specific object/style. So is Hypernetworks, which essentially does the same thing, but takes much longer. Or is the hypernetworks training method different from the LoRa training method? In that case, does it make sense to use it?. And what about textual inversion? It's also suitable for "training" based on faces/styles/objects, but it doesn't require "fusion" with the model on which the training took place, it just trains the existing model to decompose weights to get what we created the training file on? In that case, it is logical to assume that textual inversion will produce worse results than LORA, hypernetwork or dreambooth in any case. I am confused, I would like to know the opinion of people who know the subject, whether I understood everything correctly or my guess is wrong. And what is the best method for training SD based on a person's face? I'm pretty sure dreambooth is not suitable for this because of its VRAM requirements and large output file size. Also, it cannot be used to embed any other model in webui like you can with hypernetworks, textual inversion or lora.
1 Answer
LoRA seems to be the most popular and the easiest to use with Kohya_SS. There are many comparison videos on YouTube.