HunYuan LoRa Training Guide [RUNPOD]

Hello! This is an easy to follow guide on how to create your first Lora for HunYuan video using diffusion-pipe on Runpod.
I'm having a lot of fun playing with the different settings, but for now this guide is for your very first run. This guide is based from three other guides that didn't seem to work only by following the steps, so I created this more up to date guide.

If you'd like to support me for more guides in the future, please consider donating [CLICK HERE]

______________________________________________________________
Before starting, prepare your dataset. Choose high-quality images/videos with different resolutions and create text files with the same name as each image/video, providing descriptions for each. Do this beforehand to save credits!

Create a Runpod account.

Add balance.

Start an instance using a Pod with the ComfyUI CUDA 12 Light model (preferably using an A100).

Wait for the instance to initialize and connect to port 7777.

(This instance has VSCode installed instead of the regular old Jupyter Notebook)

On the VSCode tab, click the sandwich icon and open a terminal.

Clone the repository using the following command:

git clone --recurse-submodules https://github.com/tdrussell/diffusion-pipe

Install the necessary git modules:

apt-get update

apt-get install git

apt-get install git-lfs

git lfs install

Create a folder to download the models:

mkdir models

Navigate to the folder:

cd models

Download the models one at a time using wget (it takes a while, you can also open multiple terminals on the folder and download them in parallel to save time/credits):

wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_720_cfgdistill_bf16.safetensors

wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_vae_bf16.safetensors

You'll also need to clone two repositories into the models folder: one for the LLM and another for CLIP:

git clone https://huggingface.co/Kijai/llava-llama-3-8b-text-encoder-tokenizer/

git clone https://huggingface.co/openai/clip-vit-large-patch14

Open a new terminal in the workspace folder (right-click on an empty space in the folder) and install Miniconda for Linux:

mkdir -p ~/miniconda3

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh

bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

rm ~/miniconda3/miniconda.sh

After installing, close and reopen your terminal application or refresh it by running the following command:

source ~/miniconda3/bin/activate

To initialize conda on all available shells, run the following command:

conda init --all

Create a terminal in the diffusion-pipe folder.

Create a Python environment in the folder:

conda create -n diffusion-pipe python=3.12

Activate the environment:

conda activate diffusion-pipe

Install PyTorch:

pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121

pip install torchaudio==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121

Now, navigate to the diffusion-pipe folder and install the requirements.txt:

pip install -r requirements.txt

Now it’s time to configure the training run. In the diffusion-pipe/examples folder, let's modify the hunyuan_video.toml file.

First, change the model folder paths (lines 69-72):

transformer_path = '/workspace/models/hunyuan_video_720_cfgdistill_bf16.safetensors'

vae_path = '/workspace/models/hunyuan_video_vae_bf16.safetensors'

llm_path = '/workspace/models/llava-llama-3-8b-text-encoder-tokenizer'

clip_path = '/workspace/models/clip-vit-large-patch14'

In another tutorial, we’ll discuss training parameters, but for now, let’s focus on starting our first run.

At the top of the hunyuan_video.toml file, also change the output folder path (line 2):

output_dir = '/workspace/data/diffusion_pipe_training_runs/hunyuan_video_test'

Now, in the dataset.toml file, change the dataset folder path (line 34):

path = '/workspace/train'

We now need to create the train folder in the workspace.

Then simply drag your dataset into the train folder. Remember, the dataset must contain image/video pairs with a text file that has the same name as each image/video.

For character training, your dataset only needs images, but for movement training, use videos of up to 2 seconds (60 frames) in any format supported by imageIO (all except webp).

The tags for image datasets can be CLIP or descriptive, with better results using longer descriptions. For videos, an extensive description is necessary.

With your dataset ready in the train folder, run the following command in the terminal to start training:

NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config examples/hunyuan_video.toml

Let me know if you need any further adjustments or clarifications!

HunYuan LoRa Training Guide [RUNPOD]

Comments