Stable Cascade AI Avatar LoRa Training

Introduction:

In this tutorial, I will show you how to train your own AI avatar using Stable Cascade. This will allow you to generate images of yourself in high quality.

Why Cascade over SDXL?

There are a few reasons why I recommend using Cascade over SDXL for training your AI avatar:

Higher resolution: Cascade can render images at a much higher resolution than SDXL. This means that you can create images that are more detailed and realistic.
Better aesthetics: I personally find that Cascade images are more aesthetically pleasing than SDXL images. This is subjective, of course, but it is something to keep in mind.

Prerequisites:

Before you begin, you will need the following:

A dataset of high-quality images
A computer with a powerful graphics card
The CaptainCaption captioning tool
The OneTrainer training tool

Step 1: Gather your dataset

The first step is to gather a dataset of high-quality images of yourself. These images should be clear, detailed, and well-lit. It is also important to have a variety of poses and angles represented in your dataset.

I recommend using 15-30 images for your dataset. This is a good number to start with, and you can always add more images later if you want.
Thanks to the "bucketing" feature of OneTrainer, the images in the dataset no longer need to have a specific aspect ratio. A resolution between 1024 and 2048 is recommended.

Step 2: Captioning

A dataset not only includes the images but also a text file for each image that describes its content as best as possible.

Writing captions manually:

We could write these text files ourselves, but this is time-consuming and tedious.

Automatic captioning with GPT-4-Vision:

Fortunately, there are now advanced AI models that can help us with this task. GPT-4-Vision is a particularly powerful model for automatic image captioning.

42Lux CaptainCaption:

42Lux has developed a user-friendly Gradio app called CaptainCaption that uses GPT-4-Vision for automatic image captioning.

Advantages of CaptainCaption:

Precise and creative captions: GPT-4-Vision generates precise and creative descriptions that capture the content of the image in detail and vividly.

No post-processing required: Unlike other captioning methods, CaptainCaption usually does not require any post-processing of the automatically generated texts.

Easy to use: The Gradio app provides an intuitive user interface that makes CaptainCaption easy to use even for beginners.

Instructions:

Clone the GitHub repository:

git clone https://github.com/42lux/CaptainCaption

Start CaptainCaption:

python main.py

Configure the Gradio app:

Select the folder of your dataset.

Enter your GPT API key.

Enter the token of your LoRa in the prefix field.

Start the captioning process:

The Gradio app will now load the images of your dataset and automatically generate captions for each image. The finished captions are displayed in the app and can be saved as a text file.

Here is an example caption:

Step 2: Cascade LoRa training
I train the Cascade LORA using One Trainer from Nerogar.
You can download my setting preset here in the article.
To train a Cascade Lora model, you only need to download the "enft_encoder.safetensors" and enter its folder in OneTrainer. All other required models will be downloaded automatically.

Configure the training settings: In One Trainer, you can configure various settings for the training process. Here are my settings:

Optimizer: ADAFACTOR CONSTANT

Learning rate: 0.0003

Warmup steps: 200

Total steps: Images Repeats * Epochs

Target number of steps: 1500 (for a character training)

Start training: Once you have configured the training settings.

To create the images, I use the following workflow. If you want to create close-ups, you can disable the face detailer section.
Stable Cascade with Face_Detailer [ComfyUi]

I was very surprised how accurate a Cascade person LORA is compared to a SDXL LORA. According to my current tests, Cascade LORAs are also very flexible and react relatively well to prompts, which allows for very creative creations. Here are a few sample images of the finished LORA:

I am very impressed with the results of the Cascade LORA training. I think it is a powerful tool that can be used to create high-quality images of people.

Stable Cascade AI Avatar LoRa Training

Introduction:

Comments