Optimised Skyreels/Hunyuan GGUF I2V + Upscale (HLORA trigger words compatible) (3060 12GBVRAM + 32gbRAM) (Kijai WF)
Type | Workflows |
Stats | 469 |
Reviews | (32) |
Published | Feb 21, 2025 |
Base Model | |
Hash | AutoV2 71D98AE98F |
I got buzz to tip, post your creations to the workflow gallery, Have fun!
Final barebones+ text weighted Hunyuan Lora compatibility update published
831.61 seconds (NO US)
932.07 seconds (NO US)
published vids in showcase
Could potentially work on 8GBVRAM or lower if you tinker with virtual_vram_gb on the UnetLoaderGGUFDisTorchMultiGPU custom node (if you have sufficient RAM GB)
Stage 1 415.369 Stage 2 315.937 VAE 70.838 total 837.93seconds. Q6+6stepLORA+SmoothLORA+DollyLORA
(I have defaulted to DPM++2M\Beta + Smooth LORA always (without for human-centric), AVG runtime: 700-900s 73F No US)
Comfyui_MultiGPU = UnetLoaderGGUFDisTorchMultiGPU (image batch 4 flux-finetune Q8)
Comfyui_KJNodes = TorchCompileModelHyVideo, Patch Sage Attention KJ, Patch Model Patcher Order (Add nodes>KJNodes>Experimental)
∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨∨
https://huggingface.co/spacepxl/skyreels-i2v-smooth-lora
∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧∧
Finetune the virtual_vram_gb to fit your requirements (I suggest looking at the Comfyui cmd for the distorch allocation values that show up after loading the model into SamplerCustom) or use normal Unet Loader (GGUF) with skyreels-hunyuan-I2V-Q?_
Triton windows: https://github.com/woct0rdho/triton-windows/releases
Once you’ve downloaded the appropriate wheel file for your Python version, proceed to open your command prompt and navigate to the directory where the downloaded file is located. Then, run the following command:
Through python_embeded
python.exe -m pip install triton-3.2.0-(filename)
1st load
Prompt executed in 1662.22 seconds -587.365 seconds for upscale = 1075 seconds
640x864
73 frames (stable/generation time)
Steps: 6-12 (Stage 1 6 steps + Stage 2 6 steps)
cfg: 4.0
Sampler: Euler
Scheduler: Simple
(Original Kijai WF https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy/blob/main/skyreels_hunyuan_I2V_native_example_01.json)
Barebones I2V workflow with Upscaler, optimised on 306012GBVRAM + 32GBRAM
Make sure you update comfyui, torch & cuda
Run the update_comfyui.bat from the update folder
Go back to your python_embeded folder
Click on the file directory bar at the top, type cmd then hit enter
In cmd type "python.exe -m pip install --upgrade torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126"
∨∨ May ruin older workflows ∨∨
Run the other update.bat if it still aint working: update_comfyui_and_python_dependencies.bat
∧∧ May ruin older workflows ∧∧
Workflow Resources:
Fast_Hunyuan Lora (models/lora): https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hyvideo_FastVideo_LoRA-fp8.safetensors
GGUF Model (Switch the models to fit your requirements) (models/unet):
https://huggingface.co/Kijai/SkyReels-V1-Hunyuan_comfy/blob/main/skyreels-hunyuan-I2V-Q6_K.gguf
VAE model (models/vae): https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hunyuan_video_vae_bf16.safetensors
Clip_l model (I renamed it to clip_hunyuan) (models/clip):
llava_llama3 model (models/clip):
https://huggingface.co/calcuis/hunyuan-gguf/blob/main/llava_llama3_fp8_scaled.safetensors
Upscale Model (models/upscale_models):
https://huggingface.co/uwg/upscaler/blob/main/ESRGAN/4x-UltraSharp.pth
Personal Generation Times
after 1st load base gen runtimes(2Stage+Vae Decode):
758.173 seconds
704.589 seconds
with suggested lora after 1st:
779.494
169F tests after 1st (No Load Test):
OOM
121F test after 1st+6stepLORA+smoothLORA (No Load Test):
1st stage
525.14s 1st iteration
729.66s 2nd
736.19s 3rd
645.15s 4th
665.55s 5th
764.12s 6th/Average
2nd stage
81.90s 1st+2nd iteration
OOM
Instant requeue after oom runs from 2nd stage
6.17s 1st Iteration
113.74s 2nd+3rd
222.92s 4th
327.62s 5th
282.29s 6th/Average
VAE 128.309s
97F tests I2V+6stepLora (posted in gallery) (no oom yet)
1123s
1013s