Chaining LTX 2.3 Workflows for Longer Videos with low GPU

Introduction

One of the biggest limitations in current AI video generation is the short duration of clips. While LTX 2.3 produces incredible results, the "short burst" nature of the model can make storytelling difficult.

In this guide, I will show you a "One-Shot" method to bypass these limits. Instead of generating multiple clips separately and stitching them together in a video editor, we will build a Chained Workflow in ComfyUI. This allows you to generate a long, continuous sequence with synchronized sound in a single execution.

The Concept: The "Last Frame" Continuity Method

The secret to making multiple clips feel like one continuous movie is the Last Frame Method.

Instead of starting each new video segment from a random image, we take the final frame of Video A and use it as the initial input image for Video B. This creates a seamless visual bridge between segments, allowing the motion to flow naturally from one clip to the next.

How to Build the Workflow

To achieve this in ComfyUI, you don't need to start from scratch for every clip. Follow these steps:

Duplicate & Chain: Duplicate your LTX 2.3 Image-to-Video workflow block as many times as you need (2x, 3x, or more).
The Link: Connect the IMAGE output of the last frame (from the first video block) to the IMAGE input of the next video block.
Organization for Speed: To keep the workflow manageable, I recommend placing the Prompt Nodes, LoRA Loaders, and Duration Settings for each segment side-by-side. This allows you to quickly tweak the narrative of the second or third clip without hunting through a "spaghetti" of wires.
The Final Merge: Use a Combine Video Node at the end of your chain. This node takes all your generated video segments and merges them into one single file.
Audio Integration: By using the concatenation features within the node, you can also merge the audio tracks, ensuring your soundscape matches the length of your new, extended video.

The Challenge: Maintaining Character Consistency

While the "Last Frame" method is excellent for motion, it faces one major hurdle: Visual Drift.

As the video progresses, the model may "forget" the specific details of a face or body shape, leading to a loss of consistency. If the last frame of Video A is slightly blurry or lacks detail, Video B will inherit those flaws.

How to fight consistency loss:

Reinforce the Prompt: In the prompt node for the second and third segments, do not just describe the action. You must also re-describe the subject. Explicitly tell the model to maintain the "full face" or "entire body" as seen in the original starting image.
Advanced Solutions (Coming Soon): I am currently experimenting with ID-LoRAs and specialized face-consistency nodes to automate this. Integrating an ID-LoRA would allow the workflow to "lock" the character's identity regardless of how many segments you chain together.

VRAM Management

In my personal setup, I am running on Windows with a dual-monitor configuration and 10GB of VRAM. Because the Windows OS and the desktop environment consume resources, ComfyUI only has about 8GB of VRAM available for the actual generation. For maximum performance and stability, I highly recommend running ComfyUI on a dedicated Debian server without a desktop environment (headless). In a headless Linux environment, almost 100% of your VRAM is dedicated strictly to the generation process, it will be much faster.

Summary

By chaining workflows, you move from being a "clip generator" to a "director." This method allows for complex storytelling, cinematic pacing, and seamless transitions—all within a single ComfyUI run.

Check out the attached workflow below to try it yourself!

More explanation :