Sign In

Training Hunyuan Video Lora's Guide (check for updates!)

14
Training Hunyuan Video Lora's Guide (check for updates!)

We will be talking about Training Lora models on Hunyuan Video with various services and setups, I hope to offer some insight and additional information as I reveal it in my research.


If you have a GPU you everything in this article applies, although you might be more interested in the local training tips which are towards the end. We will cover services that I have used to train video lora, so you can also do the same. It's actually incredibly easy due to using QWEN-VL to automatically caption your videos. You can of course turn this off with most setups, but it means you only need the dataset. That is where we will start:


VIDEO DATASET

We will start simple and work our way up.

There are two types of video dataset:

  1. Image based
    With the image based dataset, you simply dump frames from a video, then remove all the blurry or empty frames, then save these into a .zip

  2. Video segment based
    With the Video based dataset, you split a target video into segments, then remove any confusing or non-moving segments, then save these into a .zip


We do not need to caption them, although you can as you would any image model by matching the filename for each text file and adding that video/image description.

After this we are ready to train our model, so i have two easy ways for you that work great and will give you a model that is close to the source (your dataset) and transforms well (style transfer), although you might do some things differently and gain some other ability.

TRAINING WITH H100


FAL.ai
I used this service for the "image to video lora training"
by this i mean that the video lora model is trained on still images
https://fal.ai/models/fal-ai/hunyuan-video-lora-training

It only required a minimum of 4 images, although i used 48 images i hand picked from my first test dataset. It is based on a chase scene from a JDM racing movie released in the 90's as far as i can tell, because the source had no title. There was a yellow nissan skyline racing a white mazda rx7 FC at night on the tokyo highways.
https://civitai.com/models/1210310?modelVersionId=1363196

This first model captures the aesthetic looks and even the vehicles accurately from this very well and it was trained on 360p video frames. This means the model actually upscales and fixes the compression from the source in its reproducing of regenerated the scenes at higher resolutions.

Replicate.com
I used this service for the "video to video lora training"
by this i mean that the video lora model is trained on video segments
https://replicate.com/zsxkib/hunyuan-video-lora/train

i used a comfy workflow which i will link to below this paragraph, "Video-Chunker" it is very simple, it loads a video, then used a segment based math system to step through the video in "chunks" of frames, saving them all to a directory with video combine. You use increment on the index from zero then click Queue until it is complete. expect videos on this as we showed it on livestream many times.

https://github.com/MushroomFleet/DJZ-Workflows/blob/main/DJZ-Nodes-Examples/DJZ-Nodes-Examples-Video-Chunker.json

Once i have my segmented video clips, we .zip them up and add them to the trainer. The video trainer has many cool settings we can use to control the training, but for now, all you have to do is use the defaults, although i do recommend "uniform" frame sampling mode as this is what i used for my first run using the same dataset as the example above:
https://civitai.com/models/1210310?modelVersionId=1364726

this second model is very nice, it has learned more than before, capturing the subtle camera movements from the source, and less hallucinations over if cars are reversing or going forwards.


I really hope that Civitai adds hunyuan video to the generator, and maybe if we are lucky a lora trainer to the site. So i have already set my loras to be allowed to generate if they do decide to add these features in future! I really enjoyed using Civit for Training Flux/SDXL Lora in the past, so perhaps luck will smile upon us all.



LOCAL GPU TRAINING

TBA
check back soon

14

Comments