Guide: Training a LoRA for Hunyuan video on Windows
Version 0.2
Adjusted epochs/image repeats. Please adjust these depending on your datasize. For my style lora, I used 2500-3000 steps total. Added info about checkpoint resuming.
Version 0.1
Feedback welcome. I already had a wsl environment set up so some steps may be incorrect!
It's fairly complex, so wait for a native windows tool if this seems too difficult. I haven't experimented a lot with the training settings, but these worked for me for a style LoRA.
This is aimed at using images to train the LoRA, small modifications will be needed to train with video
## Initial Setup
### 1. WSL Setup
# In Windows PowerShell (Admin)
wsl --install
After installation completes:
1. Restart your computer
2. Ubuntu will automatically start and ask you to:
- Create a username
- Set a password
Remember these credentials as you'll need them for sudo commands.
### 2. Ubuntu Initial Setup
# Update package lists
sudo apt update
sudo apt full-upgrade -y
# Install basic dependencies
sudo apt-get install -y git-lfs wget python3-dev build-essential
### 3. Verify NVIDIA Setup
# Check NVIDIA drivers are working
nvidia-smi
# Should show your GPU(s) and driver version
# If this fails, you may need to install the NVIDIA CUDA driver for WSL:
# Download from: https://developer.nvidia.com/cuda/wsl
### 4. Miniconda Installation
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
source ~/.bashrc
### 5. Clone Training Repository
git clone --recurse-submodules https://github.com/tdrussell/diffusion-pipe
cd diffusion-pipe
### 6. Setup Python Environment
conda create -n diffusion-pipe python=3.12
conda activate diffusion-pipe
# Install PyTorch. Make sure to do this before installing requirements.txt. These two steps have the potential for the most issues. These are the versions that worked for me but YMMV
pip install torch==2.4.1 torchvision==0.19.1 --index-url https://download.pytorch.org/whl/cu121
pip install torchaudio==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121
# Install requirements
pip install -r requirements.txt
# Issues:
- If you encounter CUDA compilation errors during pip install of DeepSpeed or other packages, you may need to install nvidia-cuda-toolkit via apt
- Solve other pip/torch errors with your favorite LLM
## Accessing Files in Windows
You can access your WSL files in Windows File Explorer by navigating to this directory (Ubuntu folder may differ in name):
\\wsl$\Ubuntu\home\yourusername\diffusion-pipe\
Replace 'yourusername' with the username you created during WSL setup.
This allows you to easily transfer images to your training folder and copy the finished LoRA to ComfyUI.
## Download and Organize Models
If you have the existing files, copy them from windows to these folders.
Otherwise:
cd ~/diffusion-pipe
mkdir -p models/{hunyuan,clip,llm}
# Download HunyuanVideo files
wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors -P ~/diffusion-pipe/models/hunyuan/
wget https://huggingface.co/Kijai/HunyuanVideo_comfy/resolve/main/hunyuan_video_vae_bf16.safetensors -P ~/diffusion-pipe/models/hunyuan/
# Download CLIP model
git clone https://huggingface.co/openai/clip-vit-large-patch14 models/clip
# Download LLM
git clone https://huggingface.co/Kijai/llava-llama-3-8b-text-encoder-tokenizer models/llm
## Configuration Files
Create two configuration files in the main directory (diffusion-pipe) config.toml and dataset.toml. These are in the attachments (rename as .toml)
### 1. Training Configuration (config.toml)
Example config.toml (adjust as necessary):
# Dataset config file.
output_dir = '~/training_output'
dataset = 'dataset.toml'
# Training settings
epochs = 50
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 4
gradient_clipping = 1.0
warmup_steps = 100
# eval settings
eval_every_n_epochs = 5
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1
# misc settings
save_every_n_epochs = 5
checkpoint_every_n_minutes = 30
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_middle'
[model]
type = 'hunyuan-video'
transformer_path = '~/diffusion-pipe/models/hunyuan/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors'
vae_path = '~/diffusion-pipe/models/hunyuan/hunyuan_video_vae_bf16.safetensors'
llm_path = '~/diffusion-pipe/models/llm'
clip_path = '~/diffusion-pipe/models/clip'
dtype = 'bfloat16'
transformer_dtype = 'float8'
timestep_sample_method = 'logit_normal'
[adapter]
type = 'lora'
rank = 64
dtype = 'bfloat16'
[optimizer]
type = 'adamw_optimi'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.02
eps = 1e-8
### 2. Dataset Configuration (dataset.toml)
# Resolution settings.
# Can adjust this to 1024 for image training, especially on 24gb cards.
resolutions = [512]
# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7
# Frame buckets (1 is for images)
frame_buckets = [1]
[[directory]]
# Set this to where your dataset is
path = '~/training_data/images'
# Reduce as necessary
num_repeats = 5
## Preparing Training Data
1. Create dataset directory:
mkdir -p ~/training_data/images
2. Place training images in the directory:
- LoRA: 20-50 diverse images
- Optional: Create matching .txt files with prompts (same name as image file)
Example structure:
~/training_data/images
├── image1.png
├── image1.txt # Optional prompt file
├── image2.png
├── image2.txt
## Training
Launch training with this command:
NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config config.toml
## Monitoring Training
- Monitoring GPU usage in a windows terminal:
nvidia-smi --query-gpu=timestamp,name,temperature.gpu,utilization.gpu,memory.used,memory.total --format=csv -l 5
- Training outputs will be saved in the directory specified by output_dir
in your config
## Resuming from checkpoint
If your computer crashes/you have to turn it off. use the --resume_from_checkpoint flag. If your gpu is a bit slow, consider checkpointing more regularly (uses a lot of storage space). eg:
NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" deepspeed --num_gpus=1 train.py --deepspeed --config config.toml --resume_from_checkpoint
## Using the Trained LoRA
### After training completes, find your LoRA file:
- Navigate to training output directory in Windows:
\\wsl$\Ubuntu\home\yourusername\training_output
- Look for the latest epoch folder
- Find the adapter.safetensors
file
### Using with ComfyUI:
- Copy and rename the adapter.safetensors
(to something descriptive) to your ComfyUI loras folder
- Make sure you have the HunyuanVideoWrapper node installed https://github.com/kijai/ComfyUI-HunyuanVideoWrapper
- Use the "HunyuanVideo Lora Select" node to load it
- Experiment with different epochs to find the ideal number for your dataset