This workflow supports 3 types of models currently:
Standard LTX 2.3 distilled
LTX 2.3 distilled GGUF
10Eros
💡 The models are self-contained. You can safely delete the entire group of whichever model you don't use without breaking the workflow. The remaining model groups will work independently without any additional changes needed.
This workflow is a modular and flexible text/image/audio-to-video generation system built in ComfyUI, designed to give full control over video creation using LTX-based models. It allows you to easily mix and match multiple generation modes such as text-to-video, image-to-video, lipsync, and fully guided animation by enabling or disabling grouped nodes.
📝 Personal notes:
The 10Eros model is better for NSFW content, whereas the standard model is better for SFW generations, although the body movement of the 10Eros model can be beneficial in some cases for SFW content too, but in general, use each model as I just said.
Try to always use 2 phase sampling generations (Half res + 2x upscaler), this yields the best quality and character consistency, LTX is not good at all at preserving character ID, so don't make it worse by doing a single pass generation. The upscaler model adds extra detail and improves character consistency, that's why I recommend using it.
Don't use the detailer when generating "Amateur look" videos, it adds a light layer of detail to the final result, and most of the time it will look too "polished" for a real amateur recording; amateur style videos look more real when they look low quality.
Main features
GGUF support
Prompt relay for segmented prompts
NSFW prompt enhancer
Text, image, audio, and ControlNet-driven video generation
LoRA support (character, style, and voice via ID LoRA)
Custom or AI-generated audio with automatic syncing
Reference image + up to 7 keyframes (FFLF animation control)
ControlNet video guidance with hybrid reference support
Half-res sampling + 2× upscaling for faster high-quality results
LTX detailer for enhanced final output
Common Setups
Text to video:
All bypassers disabled + Prompt + Default audioImage to video:
Prompt + Reference image + Default audioLipsync:
Prompt + Reference image + Custom audioAudio to video:
Prompt + Custom audio onlyCharacter LoRA + voice cloning:
Prompt + Character LoRA + ID LoRA + Default audioVoice reference to video:
Prompt + ID LoRA + Default audio
OR
Prompt + ID LoRA + Reference image + Default audioCharacter animation:
Prompt + ControlNet + Reference image + (Custom or Default audio)First frame → last frame:
Prompt + Keyframe 1 + Keyframe 2 + (Custom or Default audio)First → middle → last frame:
Prompt + Keyframe 1 + Keyframe 2 + Keyframe 3 + (Custom or Default audio)Character animation with custom voice:
Prompt + Reference image + ID LoRA + ControlNet + Default audio
Detailed instructions are contained in the workflow itself:
Red nodes are instructions and useful notes.
Yellow nodes are configurable elements you can adjust to your needs.


