LTX 2.3 (T2V) - Simple Workflow for Low-VRAM GPUs (8GB or 10GB VRAM)

This article details a straightforward Text-to-Video (T2V) workflow using LTX 2.3, optimized for users with limited GPU resources. It's specifically tested and confirmed to work on a system with 10GB of VRAM and 32GB of RAM. While ComfyUI’s memory usage fluctuated between 8GB and 10GB during testing, this was due to running Windows 11 with a dual-monitor setup and having other applications (video playback and Chrome) active concurrently. Under these conditions, generating a 10-second video took approximately 7 minutes.

Workflow Components & Checkpoints :

The following checkpoints, VAEs (Variational Autoencoders), and CLIP models were utilized in this workflow:

Unet (GGUF): https://huggingface.co/unsloth/LTX-2.3-GGUF/blob/main/ltx-2.3-22b-dev-Q4_K_M.gguf
Video VAE: https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_video_vae_bf16.safetensors
Audio VAE: https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/vae/LTX23_audio_vae_bf16.safetensors
Dual CLIP:
- https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors
- https://huggingface.co/Kijai/LTX2.3_comfy/blob/main/text_encoders/ltx-2.3_text_projection_bf16.safetensors
Latent Upscale Model: https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-spatial-upscaler-x2-1.1.safetensors

Feedback & Questions

I hope this workflow proves helpful for those working with limited GPU resources. If you have any questions or encounter any issues, please leave a comment below