Sign In

LTX-2.3 Dev Audio+Image To Video (GGUF)

Type

Workflows

Stats

249

0

Reviews

Published

Mar 22, 2026

Base Model

LTXV 2.3

Hash

AutoV2
0E0D22AD82

base workflow for Audio+Image to video for Dev model. LOW VRAM as possible.

can also generate text to video with audio reference. (switch red boolean node to TRUE)

i suggest leaving the prompt alone unless you want to prompt for a specific motion or action to occur.

prompt:

" Transform this static image into a high-quality video with with realistic facial expressions and realistic motion.

Perfect lip-sync to the attached audio. "

FILES:

OPTIONAL Kijais fp8 Scaled (requires load diffusion model node instead of unet loader node and replaces the gguf entirely. )

https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/diffusion_models

DEV gguf (distilled ggufs are in the repo as well)

https://huggingface.co/unsloth/LTX-2.3-GGUF/tree/main

Gemma 3_12B FP4 text encoder

https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors

Audio VAE

https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_audio_vae_bf16.safetensors

Video VAE

https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_video_vae_bf16.safetensors

Text Projection text encoder

https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders

Distill Lora

https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distilled-lora-384.safetensors

Upscaler

https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-spatial-upscaler-x2-1.1.safetensors