Sign In

【Quick Tutorial】LTX-2 T2V+I2V (workflow includes)+(gguf)

4

【Quick Tutorial】LTX-2 T2V+I2V (workflow includes)+(gguf)

If you haven't tried LTX-2 yet, aren't sure how to get started, keep hitting a bunch of errors, or just want to save time and skip all the headaches, then you absolutely don't want to miss this.

The cool thing about LTX-2 is that it can generate videos with sound, and it does it super fast!

Seriously, even an 8GB graphics card can run it without a problem.


(If you can't see the image clearly, just right-click and open it in a new tab.)


Preparation:

First, you'll want to understand the different types of LTX-2 models. This will really help you grasp how the LTX-2 workflow operates.

Checkpoint Models: (These are marked with a red box).

You've got two kinds: one is a non-distilled base model, and the other is a distilled one.

Here's how they're different in your workflow:

The non-distilled one needs a sampling step of (shift) + 20 + CFG4, while the distilled one only needs (manual sigma) + 8 steps + CFG1.

Distilled Lora: (These are marked with a yellow box). These actually work pretty similarly to a wan+lighting_Lora. They're mainly used with the non-distilled checkpoint models to help reduce the number of steps when you're upscaling or generating. (This is also why you won't be using this Lora in workflows that already use the distilled models.)

Upscaler-x2: (This one's marked with a green box). When you're using LTX-2, the official suggestion is to first generate your result at half your target resolution, and then upscale it by x2. Doing it this way gives you a lot more detail in your final output, and it's usually faster too.

1-.png

You'll need to upgrade ComfyUI to at least version 0.9.2.

This particular version fixes a few bugs and also cuts down on the VRAM usage when you're running LTX-2. You can just upgrade it right within ComfyUI itself.

Snipaste_2026-01-22_14-20-04.png

or

To upgrade, just click the .bat file located in this path.

3-.png

or

Just type cmd directly into the ComfyUI folder's address bar.

Then, in the terminal that opens, type git reset --hard 8f40b43.

After that, copy the requirements.txt file from that directory to ComfyUI_windows_portable, and finally, update your dependencies by using .\python_embeded\python.exe -m pip install -r requirements.txt.


Using ComfyUI Workflow Templates (safetensors):

I've already packaged up the official ComfyUI models and put them right here. I've also extracted them from the subgraph, which should make it much easier for us to correctly download and set up the models.

The thing is, the official templates usually look like this, so most people end up running into model setup problems if they try to use them directly:

954af7c3-1bba-4222-b307-7dc2dcc552cb.png

When it's released from the subgraph, it looks like this. Don't worry, it's not as complicated as it seems:

874eda91-1b6e-4bb9-861e-55745b572816.png

You just need to download the models following the address on the left of each workflow and then put them into the note it shows.

But make sure you pay attention to whether the workflow is for the regular (normal) version or the distilled version. Then, once you've placed the models, press 'R' to refresh them.

You'll need to select the correct one in the workflow based on its name. Here, I'll show you the difference between the distilled and non-distilled model workflows: Just pick two workflows, open them up, compare them, and pay attention to what's marked in the white box:

Normal Version:

93404b3f-e89e-42ad-9872-91985a5e1e27.png

Distilled Version:

5006bdca-7e9b-476b-9d44-6ce11d8e9f36.png

At this point, you're good to go!

Just hit the generate button and start playing around.


🆘

You might be wondering why you don't see files like "ltx-2-19b-embeddings_connector" here, especially since you've probably seen them in other workflows.

It's pretty simple: those files are actually part of the "Separated LTX2 checkpoint" workflow.

What that means is the clip, VAE, and other similar files are kept separate from the checkpoint itself.

Workflows like that are generally used for more flexible setups or in GGUF workflows.

Honestly, I actually prefer to mix GGUF and safetensors, and I'll tell you all about that in the "Separated LTX2 checkpoint" workflow section below.

🆘


Using the Separated LTX2 Checkpoint Workflow (safetensor+gguf):

Models DL:

https://huggingface.co/Kijai/LTXV2_comfy/tree/main

For this workflow, I'm going to focus mainly on the model loading section (that's marked with the white box).

All the other nodes are pretty much the same as in the workflow templates we covered earlier, so I'll just be highlighting the differences there.

5011c8bb-4457-4f01-be49-24c9e55bdda4.png

If you release this part from the subgraph, you'll see that the checkpoint and clip sections have been replaced with GGUF nodes, and the VAE is entirely replaced with the Kijai version.

This means you'll also need the ComfyUI-KJNodes. (Just a heads-up: the official VAE and the Kijai VAE versions can't be mixed; they need to be consistent.)

Did you notice that because this is a distilled checkpoint, I've bypassed the distilled_lora. Using them both at the same time will just burn out the image.

cd049a0c-8ff2-41be-b12c-4fbb9d8791b3.png

However, the ltx-2-19b-embeddings_connector itself also comes in two versions.

The distilled version needs to be used with the distilled checkpoint, and the normal version with the normal checkpoint.

84a7c852-2a39-42c9-a832-8a376497a974.png

I think that's enough for the tutorial.

I've already packed up the workflows and uploaded them as an attachment in the top right corner for you to download .

If you run into any issues while using them, feel free to reply below.


Here are some of my thoughts and ramblings:

What really interests me about LTX-2 is its ability to learn the audio from video clips during training.

This means we could potentially recreate a character's voice and appearance from a movie, which is just awesome.

Even though I'm not completely satisfied with its output quality yet, the potential is undeniably fantastic. Not to mention, it's seriously fast.

Also, I don't really get why there isn't an audio-to-audio workflow template in the official ComfyUI examples. It doesn't seem that hard to make. Maybe I'll try putting one together when I have some time.

4