Sign In

FramePack 已支持首尾帧 by Lvmin Zhang based Hunyuan Video - Best practices by Kijai nodes

167
1.5k
66
Verified:
SafeTensor
Type
Checkpoint Merge
Stats
1,237
0
Reviews
Published
Apr 19, 2025
Base Model
Hunyuan Video
Training Images
Download
Hash
AutoV2
42E1A92CD4
Tencent Hunyuan is licensed under the Tencent Hunyuan Community License Agreement, Copyright © 2024 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.
Powered by Tencent Hunyuan

更新首尾帧关键帧参考(已支持ComfyUI0421

nirvash’s repository for keyframe support (ComfyUI 无需额外权重):

nirvash/ComfyUI-FramePackWrapper

[ WEBP 格式的例图可以直接拖放到ComfyUI,包含Workflow ]

[ 也可以下载右侧组件包,其中的 example_workflows 目录中包含工作流]

Feature

  • Set end frame 支持设定结束帧

  • Assign weighted keyframes 支持加权中间帧

  • Use different prompts per section 每个FramePack分别设定提示词

based on kijai's ComfyUI-FramePackWrapper:

https://github.com/kijai/ComfyUI-FramePackWrapper


End Frame support on Pytorch Gradio Webui:

FramePack_SE by TTPlanetPig base on lllyasviel/FramePack


生图模型一样玩转视频大模型敏神&Kijai’s nodes

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

算法组:Lvmin Zhang Maneesh Agrawala

Stanford University

Paper Code

ComfyUI Wrapper for FramePack by lllyasviel

最佳实践:ComfyUI Nodes kijai/ComfyUI-FramePackWrapper

FramePack

  • Diffuse thousands of frames at full fps-30 with 13B models using 6GB laptop GPU memory.

  • Finetune 13B video model at batch size 64 on a single 8xA100/H100 node for personal/lab experiments.

  • Personal RTX 4090 generates at speed 2.5 seconds/frame (unoptimized) or 1.5 seconds/frame (teacache).

  • No timestep distillation.

  • Video diffusion, but feels like image diffusion.

  • 敏神的FramePack基于Hunyuan Video Diffuse,6G显存的笔记本电脑GPU,全fps的13B模型可连续生成数千帧视频画面。

  • 在8xA100/H100服务器以BS64对13B视频模型进行微调,用于个人/实验室。

  • 个人RTX 4090以2.5秒/帧(未优化)或1.5秒/帧的速度生成(teacache)。

  • 无时间步蒸馏。(仅CFG蒸馏,高画质)

  • 像图像扩散模型一样玩转视频大模型!

Mostly working, took some liberties to make it run faster.

Uses all the native models for text encoders, VAE and sigclip:

https://huggingface.co/Comfy-Org/HunyuanVideo_repackaged/tree/main/split_files

https://huggingface.co/Comfy-Org/sigclip_vision_384/tree/main

And the transformer model itself is either autodownloaded from here:

https://huggingface.co/lllyasviel/FramePackI2V_HY/tree/main

to ComfyUI\models\diffusers\lllyasviel\FramePackI2V_HY

Or from single file, in ComfyUI\models\diffusion_models:

https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/FramePackI2V_HY_fp8_e4m3fn.safetensors https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/FramePackI2V_HY_bf16.safetensors

Requirements

Note that this repo is a functional desktop software with minimal standalone high-quality sampling system and memory management.

Start with this repo before you try anything else!

lllyasviel/FramePack: Lets make video diffusion practical!

Requirements:

  • Nvidia GPU in RTX 30XX, 40XX, 50XX series that supports fp16 and bf16. The GTX 10XX/20XX are not tested.

  • Linux or Windows operating system.

  • At least 6GB GPU memory.

To generate 1-minute video (60 seconds) at 30fps (1800 frames) using 13B model, the minimal required GPU memory is 6GB. (Yes 6 GB, not a typo. Laptop GPUs are okay.)

About speed, on my RTX 4090 desktop it generates at a speed of 2.5 seconds/frame (unoptimized) or 1.5 seconds/frame (teacache). On my laptops like 3070ti laptop or 3060 laptop, it is about 4x to 8x slower.

In any case, you will directly see the generated frames since it is next-frame(-section) prediction. So you will get lots of visual feedback before the entire video is generated.

Cite

@article{zhang2025framepack,
    title={Packing Input Frame Contexts in Next-Frame Prediction Models for Video Generation},
    author={Lvmin Zhang and Maneesh Agrawala},
    journal={Arxiv},
    year={2025}
}

Kijai's Models Repository

Kijai/HunyuanVideo_comfy · Hugging Face