home models images videos posts articles comics challenges events updates shop

Wan2.1 InfiniteTalk V2V Lipsync (GGUF) workflows

Name: Wan2.1 InfiniteTalk V2V Lipsync (GGUF) workflows
Rating: 5 (26 reviews)
Author: javawock7618

533

Updated: Apr 7, 2026

tool

wan lipsync v2v infinitetalk

Download

1 variant available

Archive Other

16.32 KB

Verified: 2 months ago

Download (16.32 KB)

Details

Type

Workflows

Stats

403

Reviews

Positive

(19)

Published

Feb 15, 2026

Base Model

Wan Video 14B i2v 480p

Hash

AutoV2

1B7A0A08C6

About this version

default creator card background decoration

javawock7618

Important Notice

Optimized to work with the latest version of ComfyUI (v0.18.1 + Frontend 1.42.8).

Overview

This workflow uses Wan2.1 InfiniteTalk to perform native V2V lip sync.
Even if the input video is long, the workflow will automatically repeat the extension process as needed.

What This Workflow Does

Using the automatic segmentation feature of Florence2Run + SAM2, a face mask is generated and then re-rendered with InfiniteTalk.
This keeps motion outside the face faithful to the original video, while maintaining facial consistency and applying accurate lip sync.

Notes

If you encounter the following error:

RuntimeError: Input type (float) and bias type (struct c10::Half) should be the same

please change the audio_encoder from:

wav2vec2-chinese-base_fp16.safetensors
to
wav2vec2-chinese-base_fp32.safetensors

You can download it from the same location as the fp16 version.

Depending on the original video's frame count, the output may be rounded down, resulting in the video being 1–3 frames shorter.

The length is calculated from the latent frame count n using the formula:

(n - 1) * 4 + 1

Because of this rule, it is not possible to generate more frames than exist in the source video.

For example, if the final chunk has 14 frames remaining, the selectable lengths would be 13 or 17.
However, since frames 15–17 do not exist in the source video, they cannot be generated.
As a result, the length is rounded down.

If anyone has a good idea to improve this limitation, suggestions are welcome.