This is not my model. Don't ask me questions. I don't know the answers.
Go to the official LTX-2.3 HuggingFace repo for answers.
I've only uploaded the model here as a way to tag the videos I generate with it. It will most likely be taken down once the official owners or CivitAI upload their own version, so be warned.
The following is the original README as of time of publishing.
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
Model Checkpoints
NameNotesltx-2.3-22b-devThe full model, flexible and trainable in bf16ltx-2.3-22b-distilledThe distilled version of the full model, 8 steps, CFG=1ltx-2.3-22b-distilled-lora-384A LoRA version of the distilled model applicable to the full modelltx-2.3-spatial-upscaler-x2-1.0An x2 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolutionltx-2.3-spatial-upscaler-x1.5-1.0An x1.5 spatial upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher resolutionltx-2.3-temporal-upscaler-x2-1.0An x2 temporal upscaler for the ltx-2.3 latents, used in multi stage (multiscale) pipelines for higher FPS
Model Details
Developed by: Lightricks
Model type: Diffusion-based audio-video foundation model
Language(s): English
Online demo
LTX-2.3 is accessible right away via the API Playground.
Run locally
Direct use license
You can use the models - full, distilled, upscalers and any derivatives of the models - for purposes under the license.

