home models images videos posts articles bounties challenges events updates shop

LTX-2.3 DEV/DIST - IMAGE to Video and TEXT to Video with Ollama

Name: LTX-2.3 DEV/DIST - IMAGE to Video and TEXT to Video with Ollama
Rating: 5 (48 reviews)
Author: tremolo28

1.8k

Updated: Mar 6, 2026

tool

video audio ollama i2v t2v

Download (32.85 KB)

Verified: 2 months ago

Other

Details

Type	Workflows
Stats	439 0
Reviews	Positive (17)
Published	Jan 18, 2026
Base Model	LTXV2
Hash	AutoV2 135984D864

1 File

About this version

tremolo28

V2.3 LTX-2.3 DEV & Distilled Video with Audio

Image to Video and a Text to Video workflow, both can use own Prompts or Ollama generated/enhanced prompts.

works with latest LTX 2.3 Distilled model (8 steps, CFG=1) or Dev model (20 steps, CFG=3.5)

Downloads:

LTX-2.3 Distilled & Dev Models (fp8_scaled): https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/diffusion_models
Textencoder1: (fp8_e4m3fn, same as LTX-2): https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/tree/main
Textencoder2: (projection_bf16): https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders
Video & Audio Vae: https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/vae
Loras:
- Spartial upscaler (x2-1.0): https://huggingface.co/Lightricks/LTX-2.3/tree/main
- Distilled Lora for upscaler (lora.384): https://huggingface.co/Lightricks/LTX-2.3/tree/main
- Detailer Lora (same as LTX-2): https://huggingface.co/Lightricks/LTX-2-19b-IC-LoRA-Detailer/tree/main
Ollama Model (prompt only, fast): https://ollama.com/mirage335/Llama-3-NeuralDaredevil-8B-abliterated-virtuoso
- alternative model with Vision (reads input image+prompt, slower): https://ollama.com/huihui_ai/qwen3-vl-abliterated

V1.5 LTX-2 DEV Video with Audio including latest 🅛🅣🅧 Multimodal Guider

Image to Video and a Text to Video workflow, both can use own Prompts or Ollama generated/enhanced prompts.

Replaced the Guider node with latest Multimodal Guider node, see more details in WF notes or here: https://ltx.io/model/model-blog/ltx-2-better-control-for-real-workflows Before we had 1 CFG parameter for audio and video. With multimodal guider, we now can tweak audio and video seperately with even more parameters...

added a Power Lora Loader node to inject further Loras
use Image to Video Adapter Lora to improve motion for I2V: https://huggingface.co/MachineDelusions/LTX-2_Image2Video_Adapter_LoRa/tree/main
replaced a node to no longer require comfymath custom nodes

V1.0 LTX-2 DEV Video with Audio:

Image to Video and a Text to Video workflow with own Prompts or Ollama generated/enhanced prompts.

setup for the LTX2 Dev model.
uses Detailer Lora for better quality and LTX tiled VAE to avoid OOM and visual grids
2 pass rendering (motion+upscale). Upscale process uses distilled and spatial upscale Lora
setup with latest LTXVNormalizingSampler to increase video & audio quality.
Text to Video can use dynamic prompts with wildcards.

I am using these starting parameters for ComfyUi to avoid OOM (my setup: 16g Vram/64g Ram) :

--lowvram --cache-none --reserve-vram 6 --preview-method none

=> OBSOLETE with latest Comfy updates for better memory management:

Download Files: (Workflow V1.0 an V1.5)

Find Model/Lora Loader nodes within Sampler Subgraph node.

- LTX2 Dev Model (dev_Fp8): https://huggingface.co/Lightricks/LTX-2/tree/main

- Detailer Lora: https://huggingface.co/Lightricks/LTX-2-19b-IC-LoRA-Detailer/tree/main

- Distilled (lora-384) & Spatial upscaler Lora: https://huggingface.co/Lightricks/LTX-2/tree/main

- VAE (already included in above dev_FP8 model, but needed if you go for GGUF models): https://huggingface.co/Lightricks/LTX-2/tree/main/vae

- Textencoder (fp8_e4m3fn): https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/tree/main

- Image to Video Adapter Lora (more motion with I2V): https://huggingface.co/MachineDelusions/LTX-2_Image2Video_Adapter_LoRa/tree/main

- Ollama Models:

(reads prompt only, fast): https://ollama.com/goonsai/josiefied-qwen2.5-7b-abliterated-v2
alternative model with Vision (reads input image+prompt, slower, it can do reasoning by enabling "think" in Ollama generate node): https://ollama.com/huihui_ai/qwen3-vl-abliterated
Other uncensored models I have tested:
- 27b model with Vision, very slow, but knows a lot of context: https://ollama.com/mdq100/Gemma3-Instruct-Abliterated
- small, very fast model reads prompt only: https://ollama.com/artifish/llama3.2-uncensored

Save Location:

📂 ComfyUI/
├── 📂 models/
│ ├── 📂 checkpoints/
│ │ ├── ltx-2-19b-dev-fp8.safetensors
│ ├── 📂 text_encoders/
│ │ └── gemma_3_12B_it_fp8_e4m3fn.safetensors
│ ├── 📂 loras/
│ │ ├── ltx-2-19b-distilled-lora-384.safetensors
│ └── 📂 latent_upscale_models/
│ └── ltx-2-spatial-upscaler-x2-1.0.safetensors
│ └── 📂 Clip/
│ └── ltx-2.3_text_projection_bf16.safetensors

Custom Nodes used:

https://github.com/Lightricks/ComfyUI-LTXVideo
https://github.com/rgthree/rgthree-comfy
https://github.com/yolain/ComfyUI-Easy-Use
https://github.com/stavsap/comfyui-ollama
https://github.com/evanspearman/ComfyMath (V1.0 only, not required as of V1.5)
https://github.com/kijai/ComfyUI-KJNodes (as of V 2.3)
Text 2 Video only:
https://github.com/KoinnAI/ComfyUI-DynPromptSimplified