Updated: Jan 24, 2026
toolThis is same as default WF in ComfyUI, but it uses GGUF custom node. Basically, you can insert images, audio, and video into any frame, so anything is possible.
T2V, S2V, V2V, I2V First, last, middle frame.
voice clone: You can input a few seconds of audio, and then crop those same few seconds after the process is complete.
reference image: input a starting image and then instruct it to perform a completely different action. (However, the character descriptions remain the same.) Yes, this is what's called a failed I2V. Again, crop the initial image.
extend video: input the images and audio extracted from the video. It will be extended for the remaining length.
GGUF custom node: https://github.com/city96/ComfyUI-GGUF
(Please update your GGUF node and ComfyUI to the latest versions.)
LTX2 GGUF: https://huggingface.co/Kijai/LTXV2_comfy/tree/main/diffusion_models
VAE: https://huggingface.co/Kijai/LTXV2_comfy/tree/main/VAE
upscale model: https://huggingface.co/Lightricks/LTX-2/resolve/main/ltx-2-spatial-upscaler-x2-1.0.safetensors
text encoder:
gemma3 GGUF: https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main
embedding: https://huggingface.co/Kijai/LTXV2_comfy/tree/main/text_encoders
Place the text encoder-related files here: ComfyUI\models\text_encoders
audio vae is here: ComfyUI\models\checkpoints
upscale model is here: ComfyUI\models\latent_upscale_models
Use the distilled model and distilled-embedding, or use the dev model and dev-embedding with distilled-lora.
T2V: set bypass image on
I2V: set bypass image off
You can bypass upscale node for lowres.
Try starting with a lower length (perhaps 9).
