How To Train Styles (Based On SDXL)

How To Train Styles

Firstly, we need to prepare a dataset for icons with the following requirements:

Resolution should be at least 1024 pixels and very clear.
Ensure a consistent artistic style.
Include a variety of people and objects as much as possible.

When training for an art style, it's advisable to prepare a minimum of 50 images, with a recommended range of 300-500 images. The model's generalization performance tends to be poorer with fewer images. For example, even with a consistent art style, if the dataset consists of only 50 Asian portraits, the model may struggle to generate portraits with a European or American portraits. Therefore, it's crucial to include a diverse range of portraits, scenes, animals, plants, and objects in the dataset, while ensuring a consistent art style. This approach allows the dataset to quickly reach a size of 300-500 images. However, caution should be exercised to avoid overly similar or duplicate images in the dataset.

Here is a reference dataset of styles:

https://huggingface.co/datasets/ErhaChen/minecraft

https://huggingface.co/datasets/ErhaChen/sketch_for_art_examination

https://huggingface.co/datasets/ErhaChen/2d_game_scence

https://huggingface.co/datasets/ErhaChen/oil_and_watercolor_painting

Where do we get the images for training next? First, we can use the artworks of a particular artist with their consent, ensuring a consistent style as our dataset. However, I highly recommend utilizing images generated by Midjourney for training. Midjourney can quickly produce a large variety of images with a unified art style, making it very convenient. Finally, if you wish to train for the art style of a specific computer game, you can log in to Steam, enter the game's player community, and directly use screenshots uploaded by players. Regardless of the source material, don't forget about upscaling in the end：

Next, we need to annotate the images, and here, we recommend starting with manual annotation. We need to use natural language to describe as many objects as possible in the images. Additionally, you can use WD14 for machine annotation. After machine annotation is complete, it's recommended to review the annotation files because sometimes machine annotations may not be very accurate：

What are instant prompt and class prompt?

Class prompt refers to the classification of the training subject. If training on characters, you can fill in man, women, girl, etc. Since we are training for an art style, fill in style.

Instant prompt refers to the name of the training subject, which can be a character's name, an art style name, or an object name, etc.

Instant prompt can call relevant content from the base model, serving as a foundation during training. For example, if you want to train a uniquely styled watercolor painting, you can fill in "watercolor" as the instant prompt. This will initiate training based on the watercolor style inherent in the base model. This approach helps enhance the final results.

How to determine what to fill in for Instant prompt? You can launch SD webUI, enter the desired Instant prompt as a prompt, and then test whether it can generate images with a similar art style. If successful, it can be used as the Instant prompt.

Finally, we need to use the Instant prompt as a trigger word, adding it at the end of each annotation file, like this:

During training, you can place all the images and annotation files into the same folder.

Now, you can proceed with training. The training parameters are as follows:

device : 4090

instant prompt : oil and watercolor painting

class prompt : style

image count : 661

batch size : 5

repeats : 20

batches per epoch : 2644

epoch : 15

clip_skip : 1

learning_rate : 0.0012,

text_encoder_lr : 0.0012,

unet_lr : 0.0012,

max_resolution : 1024,1024 ,

min_bucket_reso : 256,

max_bucket_reso : 2048,

network_dim : 128,

network_alpha : 1,

Finally, we will have 15 models, and here we need to test them one by one. Because in art style training, sometimes 3-5 epochs are enough to complete the training, while other times it may require more than 10 epochs.

The testing methods are as follows:

Input various prompts to test whether it can accurately depict the content of the prompts while maintaining the art style.

Combine with other character LoRA models to test whether it can accurately reproduce character features while maintaining the art style.

Combine with other base models to test compatibility.

How To Train Styles (Based On SDXL)

Comments