Sign In

LTX 2.3 Lip-Sync Workflow – 3min for 10s video, walk&talk supported

Updated: Apr 7, 2026

base model

Verified:

Other

Type

Workflows

Stats

36

0

Reviews

Published

Apr 7, 2026

Base Model

LTXV 2.3

Hash

AutoV2
1FE252F9BC

 

Click here to try online first:

 

Workflow: Lip-Sync Speaking/Singing – LTX2.3 Image-to-Digital Human – Auto Expansion – Module Optimization – No Subtitles  

Experience link: https://www.runninghub.ai/post/2038618856104665090/?inviteCode=rh-v1401

 

Workflow: Text-to-Lip-Sync Video – Speaking/Singing – LTX2.3 Text-to-Digital Human – No Subtitles – Module Optimization  

Experience link: https://www.runninghub.ai/post/2038618886479814658/?inviteCode=rh-v1401

 

Workflow: LTX2.3 – Fully Automated Prompt – Text-to-Video  

Experience link: https://www.runninghub.ai/post/2031218445026594817/?inviteCode=rh-v1401

 

Workflow: LTX2.3 – Fully Automated Prompt – Image-to-Video – Modular Tuned Edition  

Experience link: https://www.runninghub.ai/post/2031218459471777794/?inviteCode=rh-v1401

 

Workflow: LTX2.3 – Fully Automated Prompt – First/Middle/Last Frame Three-Image-to-Video  

Experience link: https://www.runninghub.ai/post/2035325465820405761/?inviteCode=rh-v1401

 

Name: LTX 2.3 Image-to-Lip-Sync Meme Workflow (Modular / Ultra-Fast / Action-Supported)

【Name】

LTX 2.3 图生对口型鬼畜工作流(模块化/超快/支持动作)

 

Introduction:

Built on the open-source LTX 2.3 model, optimized for image-to-lip-sync videos. It allows any image (people/animals/medium-close-up) to accurately sing or speak along with the uploaded audio, while controlling actions (walking, waving, jumping, etc.) via prompts.

 

【简介】

基于LTX 2.3开源模型打造,专为图生对口型视频优化。可让任意图片(人物/动物/中近景)随着上传的音频精准唱歌或说话,同时通过提示词控制动作(走路、挥手、跳跃等)。

 

Core Advantages:

- Extremely fast: a 10-second 1280-resolution video takes only 3-6 minutes; even faster on second run

- Batch 5x: tested running 5 workflows simultaneously, producing a dozen finished videos per day

- Modular grouping: upload → dimension setting → audio → Latent creation → upscale; clear and easy to modify

- With fixed shots, it's almost impossible to tell generated clips from original; perfect for memes/entertainment/vtubers

- Supports MP3 audio (if error occurs, re-export once from CapCut)

- Avoid prompts like "look down" or "turn around" as they break character consistency

 

【核心优势】

- 速度极快:1280分辨率10秒视频仅需3~6分钟,工作流第二次运行更快

- 5开批量:实测同时跑5个工作流,一天产出十几个成品

- 模块化分组:上传→尺寸设置→音频→Latent创建→放大,一目了然,易于修改

- 固定镜头下几乎无法分辨生成与原片,适合鬼畜/娱乐/虚拟主播

- 支持MP3音频(如遇报错,用剪映重新导出一次即可)

- 避免提示词:低头、转身等会破坏人物一致性

 

Workflow Structure:

1. Upload image (medium/close-up, clear lip movements)

2. Set dimensions (longest side 1280)

3. Upload audio (10-15 seconds recommended)

4. Latent module references both image and audio, scaling at the same time

5. Final upscale and output

 

【工作流结构】

1. 上传图片(中近景,口型清晰)

2. 尺寸设置(最长边1280)

3. 上传音频(推荐10~15秒)

4. Latent模块参考图片+音频,同时缩放

5. 最终放大出片

 

Results Showcase:

This workflow has been used to create the "round-headed elderly meme singing" video (see example). Speaking lip-sync is equally excellent; paired with Qianwen voice design, it can be used for digital humans.

 

【效果展示】

已用本工作流制作“圆头耄耋魔性唱歌”鬼畜视频(见示例)。说话对口型同样优秀,配合千问声音设计可做数字人。

 

Note:

LTX2.3 is the open-source model closest to cinema-grade in texture and color control.

 

【注意】

LTX2.3是开源模型中质感、色彩控制最接近影视级的模型。

 

Click here to try online first:

 

Workflow: Lip-Sync Speaking/Singing – LTX2.3 Image-to-Digital Human – Auto Expansion – Module Optimization – No Subtitles  

Experience link: https://www.runninghub.ai/post/2038618856104665090/?inviteCode=rh-v1401

 

Workflow: Text-to-Lip-Sync Video – Speaking/Singing – LTX2.3 Text-to-Digital Human – No Subtitles – Module Optimization  

Experience link: https://www.runninghub.ai/post/2038618886479814658/?inviteCode=rh-v1401

 

Workflow: LTX2.3 – Fully Automated Prompt – Text-to-Video  

Experience link: https://www.runninghub.ai/post/2031218445026594817/?inviteCode=rh-v1401

 

Workflow: LTX2.3 – Fully Automated Prompt – Image-to-Video – Modular Tuned Edition  

Experience link: https://www.runninghub.ai/post/2031218459471777794/?inviteCode=rh-v1401

 

Workflow: LTX2.3 – Fully Automated Prompt – First/Middle/Last Frame Three-Image-to-Video  

Experience link: https://www.runninghub.ai/post/2035325465820405761/?inviteCode=rh-v1401

 

Name: LTX 2.3 Image-to-Lip-Sync Meme Workflow (Modular / Ultra-Fast / Action-Supported)

【Name】

LTX 2.3 图生对口型鬼畜工作流(模块化/超快/支持动作)

 

Introduction:

Built on the open-source LTX 2.3 model, optimized for image-to-lip-sync videos. It allows any image (people/animals/medium-close-up) to accurately sing or speak along with the uploaded audio, while controlling actions (walking, waving, jumping, etc.) via prompts.

 

【简介】

基于LTX 2.3开源模型打造,专为图生对口型视频优化。可让任意图片(人物/动物/中近景)随着上传的音频精准唱歌或说话,同时通过提示词控制动作(走路、挥手、跳跃等)。

 

Core Advantages:

- Extremely fast: a 10-second 1280-resolution video takes only 3-6 minutes; even faster on second run

- Batch 5x: tested running 5 workflows simultaneously, producing a dozen finished videos per day

- Modular grouping: upload → dimension setting → audio → Latent creation → upscale; clear and easy to modify

- With fixed shots, it's almost impossible to tell generated clips from original; perfect for memes/entertainment/vtubers

- Supports MP3 audio (if error occurs, re-export once from CapCut)

- Avoid prompts like "look down" or "turn around" as they break character consistency

 

【核心优势】

- 速度极快:1280分辨率10秒视频仅需3~6分钟,工作流第二次运行更快

- 5开批量:实测同时跑5个工作流,一天产出十几个成品

- 模块化分组:上传→尺寸设置→音频→Latent创建→放大,一目了然,易于修改

- 固定镜头下几乎无法分辨生成与原片,适合鬼畜/娱乐/虚拟主播

- 支持MP3音频(如遇报错,用剪映重新导出一次即可)

- 避免提示词:低头、转身等会破坏人物一致性

 

Workflow Structure:

1. Upload image (medium/close-up, clear lip movements)

2. Set dimensions (longest side 1280)

3. Upload audio (10-15 seconds recommended)

4. Latent module references both image and audio, scaling at the same time

5. Final upscale and output

 

【工作流结构】

1. 上传图片(中近景,口型清晰)

2. 尺寸设置(最长边1280)

3. 上传音频(推荐10~15秒)

4. Latent模块参考图片+音频,同时缩放

5. 最终放大出片

 

Results Showcase:

This workflow has been used to create the "round-headed elderly meme singing" video (see example). Speaking lip-sync is equally excellent; paired with Qianwen voice design, it can be used for digital humans.

 

【效果展示】

已用本工作流制作“圆头耄耋魔性唱歌”鬼畜视频(见示例)。说话对口型同样优秀,配合千问声音设计可做数字人。

 

Note:

LTX2.3 is the open-source model closest to cinema-grade in texture and color control.

 

【注意】

LTX2.3是开源模型中质感、色彩控制最接近影视级的模型。