Pose, Image, Audio to Video.
v1: uses distilled model, faster inference but can result in plastic looking skin
v2: uses dev model, longer inference, result in more natural looking skin
use --reserve-vram 1 launch options if you are facing OOM issues.
Tested on 16GB vram, 64GB system ram, 1600 x 900 resolution, 121 frames.
