home models images videos 3D Models articles comics challenges updates shop

HuMo for Wan

Name: HuMo for Wan
Rating: 5 (14 reviews)
Author: Cyph3r

273

Updated: Sep 13, 2025

base model

Download

1 variant available

fp16 SafeTensor

Wan2_1-HuMo-14B_fp16.safetensors

Half precision, best balance • 31.77 GB

Verified: 10 months ago

Download (31.77 GB)

Details

Type

Checkpoint

Stats

116

Reviews

Positive

(12)

Published

Sep 13, 2025

Base Model

Wan Video 14B t2v

Hash

AutoV2

51837C1CE0

Tensors

default creator card background decoration

324

1.9K

Cyph3r

Joined Jan 8, 2023

License:

Apache 2.0

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

✨ Key Features

HuMo is a unified, human-centric video generation framework designed to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. It supports strong text prompt following, consistent subject preservation, synchronized audio-driven motion.

VideoGen from Text-Image - Customize character appearance, clothing, makeup, props, and scenes using text prompts combined with reference images.
VideoGen from Text-Audio - Generate audio-synchronized videos solely from text and audio inputs, removing the need for image references and enabling greater creative freedom.
VideoGen from Text-Image-Audio - Achieve the higher level of customization and control by combining text, image, and audio guidance.

Examples and models from the following sources reuploaded for your convenience here:
https://huggingface.co/bytedance-research/HuMo
https://github.com/Phantom-video/HuMo

Compatible with both 480P and 720P resolutions. 720P inference will achieve much better quality.