V2.3 LTX-2.3 DEV & Distilled Video with Audio
Image to Video and a Text to Video workflow, both can use own Prompts or Ollama generated/enhanced prompts.
works with latest LTX 2.3 Distilled model (8 steps, CFG=1) or Dev model (20 steps, CFG=3.5)
Downloads:
LTX-2.3 Distilled & Dev Models (fp8_scaled): https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/diffusion_models
Textencoder1: (fp8_e4m3fn, same as LTX-2): https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/tree/main
Textencoder2: (projection_bf16): https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders
Video & Audio Vae: https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/vae
Loras:
Spartial upscaler (x2-1.0): https://huggingface.co/Lightricks/LTX-2.3/tree/main
Distilled Lora for upscaler (lora.384): https://huggingface.co/Lightricks/LTX-2.3/tree/main
Detailer Lora (same as LTX-2): https://huggingface.co/Lightricks/LTX-2-19b-IC-LoRA-Detailer/tree/main
Ollama Model (prompt only, fast): https://ollama.com/mirage335/Llama-3-NeuralDaredevil-8B-abliterated-virtuoso
alternative model with Vision (reads input image+prompt, slower): https://ollama.com/huihui_ai/qwen3-vl-abliterated
V1.5 LTX-2 DEV Video with Audio including latest 🅛🅣🅧 Multimodal Guider
Image to Video and a Text to Video workflow, both can use own Prompts or Ollama generated/enhanced prompts.
Replaced the Guider node with latest Multimodal Guider node, see more details in WF notes or here: https://ltx.io/model/model-blog/ltx-2-better-control-for-real-workflows Before we had 1 CFG parameter for audio and video. With multimodal guider, we now can tweak audio and video seperately with even more parameters...
added a Power Lora Loader node to inject further Loras
use Image to Video Adapter Lora to improve motion for I2V: https://huggingface.co/MachineDelusions/LTX-2_Image2Video_Adapter_LoRa/tree/main
replaced a node to no longer require comfymath custom nodes
V1.0 LTX-2 DEV Video with Audio:
Image to Video and a Text to Video workflow with own Prompts or Ollama generated/enhanced prompts.
setup for the LTX2 Dev model.
uses Detailer Lora for better quality and LTX tiled VAE to avoid OOM and visual grids
2 pass rendering (motion+upscale). Upscale process uses distilled and spatial upscale Lora
setup with latest LTXVNormalizingSampler to increase video & audio quality.
Text to Video can use dynamic prompts with wildcards.
I am using these starting parameters for ComfyUi to avoid OOM (my setup: 16g Vram/64g Ram) :
--lowvram --cache-none --reserve-vram 6 --preview-method none
=> OBSOLETE with latest Comfy updates for better memory management:
Download Files: (Workflow V1.0 an V1.5)
Find Model/Lora Loader nodes within Sampler Subgraph node.
- LTX2 Dev Model (dev_Fp8): https://huggingface.co/Lightricks/LTX-2/tree/main
- Detailer Lora: https://huggingface.co/Lightricks/LTX-2-19b-IC-LoRA-Detailer/tree/main
- Distilled (lora-384) & Spatial upscaler Lora: https://huggingface.co/Lightricks/LTX-2/tree/main
- VAE (already included in above dev_FP8 model, but needed if you go for GGUF models): https://huggingface.co/Lightricks/LTX-2/tree/main/vae
- Textencoder (fp8_e4m3fn): https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/tree/main
- Image to Video Adapter Lora (more motion with I2V): https://huggingface.co/MachineDelusions/LTX-2_Image2Video_Adapter_LoRa/tree/main
- Ollama Models:
(reads prompt only, fast): https://ollama.com/goonsai/josiefied-qwen2.5-7b-abliterated-v2
alternative model with Vision (reads input image+prompt, slower, it can do reasoning by enabling "think" in Ollama generate node): https://ollama.com/huihui_ai/qwen3-vl-abliterated
Other uncensored models I have tested:
27b model with Vision, very slow, but knows a lot of context: https://ollama.com/mdq100/Gemma3-Instruct-Abliterated
small, very fast model reads prompt only: https://ollama.com/artifish/llama3.2-uncensored
Save Location:
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 checkpoints/
│ │ ├── ltx-2-19b-dev-fp8.safetensors
│ ├── 📂 text_encoders/
│ │ └── gemma_3_12B_it_fp8_e4m3fn.safetensors
│ ├── 📂 loras/
│ │ ├── ltx-2-19b-distilled-lora-384.safetensors
│ └── 📂 latent_upscale_models/
│ └── ltx-2-spatial-upscaler-x2-1.0.safetensors
│ └── 📂 Clip/
│ └── ltx-2.3_text_projection_bf16.safetensors
Custom Nodes used:
https://github.com/evanspearman/ComfyMath (V1.0 only, not required as of V1.5)
https://github.com/kijai/ComfyUI-KJNodes (as of V 2.3)
Text 2 Video only:


