home models images videos posts articles bounties challenges events updates shop

LTX-2.3 Image to Video AudioSync Simple Workflow +T2V / V1, V2.1, Native, V3

Name: LTX-2.3 Image to Video AudioSync Simple Workflow +T2V / V1, V2.1, Native, V3
Rating: 5 (30 reviews)
Author: ukr8b3g201

837

Updated: Mar 10, 2026

tool

comfyui audiosync i2v ltx-2 ltx 2.3

Download (608.47 KB)

Verified: 17 hours ago

Other

Details

Type	Workflows
Stats	168 0
Reviews	Positive (13)
Published	Mar 10, 2026
Base Model	LTXV 2.3
Hash	AutoV2 7308424F40

1 File

About this version

default creator card background decoration

ukr8b3g201

LTX-2.3 Image to Video AudioSync Simple V3

Override gemma-3-12b text encoder in TextGenerateLTX2Prompt with new Lora's gemma-3-12b-it-abliterated_lora_rank64_bf16, It's not uncensored

The official Comfy video_ltx2_i2v_AudioSync workflow has been launched,

replacing the current native workflow. Both are functionally almost the same, but the official one may be better.

Therefore, there is no longer any need to stick to the native workflow,

and V3 uses Some memory reduction custom nodes.

Test images and audio included

Required : ComfyUI 0.16.x

Requires audio data such as MP3 and one image

Required SageAttention

checkpoints

ltx-2.3-22b-dev-fp8.safetensors

text_encoders

gemma_3_12B_it_fp4_mixed.safetensors

loras

ltx-2.3-22b-distilled-lora-384.safetensors

gemma-3-12b-it-abliterated_lora_rank64_bf16.safetensors

latent_upscale_models

ltx-2.3-spatial-upscaler-x2-1.0.safetensors

Custom Nodes

https://github.com/Lightricks/ComfyUI-LTXVideo

https://github.com/rgthree/rgthree-comfy

https://github.com/kijai/ComfyUI-KJNodes

https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite

LTX-2.3 Image to Video AudioSync Simple Native Workflow(v1.1)

All ComfyCore Node-Native Workflow [2026/03/08]

Required : ComfyUI 0.16.4

Requires audio data such as MP3 and one image

Comfyui 0.16.4 template base + audio sync added mod +

No custom nodes are required, but the latest ComfyUI (0.16.4) is required.

There is an unknown effect at the end, but I don't know how to solve it.

ZIP file contains one test image and sound

If you get OOM in VAE Decode (Tiled) at long lengths, try lowering the temporal size, however lowering it too much may result in noise and ghosting. It's trial and error.

disable_TextGenerate " false " to enable Prompt enhancement

disable_TextGenerate " true " to bypass prompt enhancement

disable_TextGenerate Initial value " false "

Disable_i2v " true " to T2V

When using TextGenerateLTX2Prompt (Prompt Enhancement), it may take some time to generate.

checkpoints

ltx-2.3-22b-dev-fp8.safetensors

text_encoders

gemma_3_12B_it_fp4_mixed.safetensors

loras

ltx-2.3-22b-distilled-lora-384.safetensors

latent_upscale_models

ltx-2.3-spatial-upscaler-x2-1.0.safetensors

No custom nodes required

tested on :ComfyUI version: 0.16.4, Python: 3.12.12, pytorch : 2.10.0+cu130

Geforce RTX5060Ti16GB, 64GB System memory

V2.1:Added T2V switch [2026/03/08]

LTX-2.3 Image to Video AudioSync Simple Workflow(v2.1)

One image and audio required
Uses ComfyUI template models except for checkpoints (ltx-2.3-22b-dev-fp8,safetensors : 29.1GB)
It is likely to work because it conforms to the ComfyUI template workflow.

Added T2V switch (2026/03/08)

Set disable_i2v to "true" for T2V, but if Image Latency Switch is "true", the specified image size and ratio will be used, so it is better to set Image Latency Switch to "false" and switch to EmptyLTXVLatent (false).

TextGenerateLTX2Prompt performs image analysis and prompt enhancement. It is memory-efficient when used with the Gemma-3-12B text encoder as the LLM.

NSFW may not be prompted?

If it doesn't work as expected, try "Bypassing TextGenerateLTX2Prompt"

checkpoints

ltx-2.3-22b-dev-fp8.safetensors

text_encoders

gemma_3_12B_it_fp4_mixed.safetensors

loras

ltx-2.3-22b-distilled-lora-384.safetensors

latent_upscale_models

ltx-2.3-spatial-upscaler-x2-1.0.safetensors

MelBandRoFormer_comfy

MelBandRoformer_fp32.safetensors

Custom Nodes

https://github.com/Lightricks/ComfyUI-LTXVideo

https://github.com/rgthree/rgthree-comfy

https://github.com/kijai/ComfyUI-KJNodes

https://github.com/kijai/ComfyUI-MelBandRoFormer

https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite

https://github.com/pixelpainter/comfyui-mute-bypass-by-ID

tested on :ComfyUI version: 0.16.0, Python: 3.12.12, pytorch : 2.10.0+cu130

Geforce RTX5060Ti16GB, 64GB System memory

LTX-2 Image to Video AudioSync Simple Workflow(V.1)

A simple workflow incorporating AudioSync into ComfyUI video_ltx2_i2v template workflow

If the audio data is longer than 60 seconds, the image may be distorted.
2D: Anime-style images may be distorted.
I have never created a video with a lot of movement, so in that case, please use it with some tweaks to the prompts or change various LoRa settings.
It uses LoRa : ltx-2-19b-ic-lora-lipdubbing.safetensors to accelerate lip sync, so if you need something else, replace it with Camera LoRa etc.
May not work in low memory environments

Tested on ComfyUI 0.15.1: GeForce RTX5060Ti 16GB, 64GB system RAM

Generation time of over 20 minutes for a 60-second video