Putting this here until we have a Guides and Tutorials section on this website.
LoRA !___ ART STYLE ___! Training Guide
THIS GUIDE, AND SUBSEQUENTS IMPROVEMENTS BASED ON FEEDBACK I RECEIVE AND MY OWN EXPERIMENTS WILL SPECIFICALLY BE FOCUSED ON _____ ! ! ART STYLES ONLY !! ____.
Go elsewhere if you want to ask questions about how to train a character/person. I don't care about training characters, there are enough tutorials about that already. I will not help you at all about that.
This might not be optimal, but nobody else has done it so let me try.
To follow this guide to the letter, you will need the following.
Xformers and fp16 capable GPU
Ability to troubleshoot without my help
The ability to read what I said above and to not ask questions about training characters/people. This is a guide for STYLE.
You have contempt for using Google Colab and won't ask 'How can I do this on Google Colab?'
You won't ask a question like 'How to do this on 6gb card?!' because maybe you can but I dont care about helping you with that. HOWEVER, if you discover how to do it on a low vram card, and then share the method, then its very welcome to share it.
INSTALLATION and INITIAL SETUP
Run the two installers .bat files. They will create install to folders in the directory. Install it on your C: drive or it probably won't work. Also, it's Windows only. If you have Linux you probably know what you're doing anyway.
Download a model to use as a base. I use https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors
I recommend making a new folder called training, and putting a copy of this file in here. It'll make the file selection during the training setup more convenient.
Create a new folder inside /training with a name for your project. /training/projectname
3a. Inside /training/projectname, create three folders. /image, /log, /model
3b. Inside the /image folder, create a new folder called /10_projectname. 10 is the number of times each image will be trained per epoch. 10 seems good, unless your training image set is very large, then you might just try 5.
After completing the install and setting up the folders/directories, do the following.
Class regulation images. Optional: In the /training folder, make a folder called /regulation. Generate 1,000 to 2,000 class regulation images, using the SD 1.5 model, with the prompt 'art style' or 'artwork style' or 'illustration style' or 'painting' or 'painting style' or 'art' or something similar to those, depending on what kind of style it is.
1a. Save/move those regulation images into a directory structure of /training/regulation/art-style (or whatever prompt you used to generate them, hyphenated or not. Just my personal preference to hyphenate).
Prepare your training set. You can use Irfanview to batch process them, including their filename. I recommend making all of them have the same tag at the beginning, in order to make it so that an additional keyword prompt is effective. For example, if you're training with Dark Souls 3 screenshots, put 'dark souls 3' at the start of the filenames, along with other tags that generally describe the style. 'dark souls 3, art style, video game screenshot, 3d game' or similar.
2a. Put the training set images into the 10_projectname folder. Note: Make sure the total number of images is an even number.
Start the training wizard through /sd-scripts/run_popup.bat. This directory and file were installed by the scripts we ran back in Installation step 1.
My recommended parameters:
load a json config file: no
base model: SDv1-5-pruned-emaonly.safetensors
image folder: /training/projectname/image (do not choose the 10_projectname folder!)
output folder: /training/projectname/model
save log? yes: /training/projectname/log
regulation images? yes/no: /training/regulation/art-style
continue from earlier version? no
batch size: 1 for low vram, 2 for high vram
number of epochs: 40
dim: 1, 2, 4, 8, 16, 32, 64, 128 (choose anything. the higher the number, the higher the filesize. I recommend trying a low number, even 1 works well, and the filesize will only be 1mb. My model published here only uses 1 and the results are okay)
resolution: 512 / 768, depending on your training set. I just use the standard 512
learning rate: 1e-3, 1e-4, 1e-5, 5e-4, etc. (I recommend trying 1e-3 which is 0.001, it's quick and works fine. 5e-4 is 0.0005)
text encoder learning rate: choose none if you don't want to try the text encoder, or same as your learning rate, or lower than learning rate.
unet learning rate: choose same as the learning rate above (1e-3 recommended)
scheduler: cosine with restarts
cosine restarts: 12
save epochs? yes
how often to save? 2 or 4 - recommended so you can experiment with all the epochs to find which is best
shuffle captions? no
keep some tokens at the front? no
warmup ratio? no
change output name: projectname
meta comment: include the main keyword of the filenames, but it doesn't have any effect on training, just a nice thing in case people want to know a good keyword prompt to use in addition to invoking the LoRA.
NOTE: Train Unet / Train Text encoder: This is the part where you can split the LoRA into either a Unet encode or Text encode only, otherwise it will train both. It's arguably best to train both at the same time, and text encoder is good if your captions are specific. I would recommend trying Unet only first.
So, to train Unet only, do this...
train unet only? true (this will only train the unet, and will reduce vram usage. Choosing false will allow you in the next step to )
train text encoder only: false (if you have chosen not to train the unet only, this option will be available, and you should choose false to train both unet and text encoder, which might provide better results, but I haven't noticed any difference in quality worth mentioning myself)
To train text encoder only, do this...
train unet only? false
train text encoder only: true
To train both Unet and text encoders, do this...
train unet only? false
train text encoder only: false
queue another training? choose yes if you want to run another training right after the other. I would recommend doing this if you want a more convenient way to train with different options, such as increasing dim, training both unet and text, etc. etc. so you can leave them all to train while you do other stuff such as sleep.
Please give it a try, and post your results here and publish any models that were generally successful.
If you have any issues, make sure you haven't just skimmed these instructions and have actually read them carefully. I mainly want to focus on improving the training parameters for people who are already up and running.
За п.7 отдельный респект.))
I suggest using 5e-5 for text encoder.
Thanks! I have a question. I watched some tutorials on YouTube that said the training LORA style requires describing images in as much detail as possible. I am confused because your method seems to work well but in a controversial way. I think there are reasons why both strategies work well.
Very detailed, thank you。
1、貌似比训练人物多了个Class regulation images，但从整套流程看下来，并不知道他作用和意义。如果我自己想训练的画风是自己的，市面上没有的，那个存在的意义为0？
4、学习率“1E-5、5E-4等。（我建议尝试 1e-3，即 0.001，它很快且工作正常。 5e-4 是 0.0005）”，alpha为1，这些数字下次训练可以尝试下。
I keep getting an error
No data found. Please verify arguments (train_data_dir must be the parent of folders with images)
I dont understand the directory set up, can you list the directories in a simple way
In addition to increasing the training speed, batch size also has a number of effects on generalizability. The recovery of lora for the graphs in the training set is highest when batch size = 1, provided no other errors occur. When the batch size becomes larger, it is more biased towards highlighting the commonalities and styles in the graphs. Therefore, when training characters, a batch size of 1 or 2 is recommended, while when training styles, a batch size of at least 2 is recommended.
Syle Lora needs 20+ image dataset I think. dataset is very very important than dim and learning rate.