The Problem
Character consistency is the white whale of AI generation. The standard playbook is LoRA training, IP-Adapter, or multi-view diffusion models. They work, but they’re heavy: training data, compute time, model-specific quirks, and results that still drift between generations.
This tutorial takes a completely different approach. Instead of generating consistent images, you generate one 8-second video. A smooth 360-degree camera orbit around your character. The video model handles consistency for free because every frame comes from the same continuous generation. Then you extract frames and drop them into a lightweight web viewer.
Fully open-source. Runs locally. Zero API costs.
The result is an interactive character portrait your audience can rotate by dragging. Works on desktop and mobile.
Live demo: Interactive 360 viewer
Video walkthrough: YouTube tutorial
Full step-by-step tutorial page: 360.cyfidesigns.com/ltx-tutorial-preview
Why Video Generation Solves Consistency
Video generation models maintain temporal coherence by design. That’s their entire job: making sure frame N+1 looks like it belongs after frame N. When you set up a camera orbit around a static subject, you exploit that coherence engine to get what amounts to a multi-view character sheet from a single generation.
The newest open-source video models are finally powerful enough to deliver on this. You ask for consistency in the prompt, and models like LTX Video 2.3 are now capable of maintaining it across a full 360-degree rotation.
The demo shown here was generated with the Unsloth Q4_K_M distilled quantization of LTX 2.3. That’s a lower-quality, compressed version of the full model, optimized to run on consumer hardware. The fact that even the quantized distilled variant produces results like this speaks to how capable these open-source models have become.
What You Need
Requirement
Details
GPU
NVIDIA with 8GB+ VRAM (12GB+ recommended). RTX 3060 and above.
ComfyUI
Latest version installed and working
LTX Video 2.3
Model + ComfyUI nodes from Lightricks/ComfyUI-LTXVideo
RTX Video Super Resolution
Optional upscaler from Comfy-Org/Nvidia_RTX_Nodes_ComfyUI
ffmpeg
For frame extraction
All models can be downloaded through ComfyUI’s built-in model manager. One-time model weights download. No API keys, no cloud credits.
Step 1: Reference Image
Start with a front-facing character image (any image gen model works). This becomes the first frame of your video via the LTXVAddGuideAdvanced node. The model uses it as an anchor for the entire generation.
This is what separates the LTX workflow from pure text-to-video: you’re giving the model a starting frame, so it knows exactly what the character looks like from the front.
Step 2: The ComfyUI Workflow
The workflow has three key components:
1. LTXVAddGuideAdvanced - Locks your reference image as the starting frame
2. LTX Video 2.3 generation - Produces the orbital video
3. RTX Video Super Resolution - Upscales the output (optional but recommended)
Connect your reference image to the LTXVAddGuideAdvanced node as the guide frame. Set the video generation to maximum duration (97 frames at 24fps gives you about 4 seconds per generation, chain two for a full orbit).
Step 3: The Prompt (This Is Everything)
The prompt structure matters more than anything else. Camera instructions first, character description last:
A slow, smooth 360-degree orbit around a [character description]. Camera rotates
clockwise at a constant speed. Studio lighting on a dark background. Character
stands still in a neutral pose facing forward. Camera maintains consistent distance
throughout the full rotation.
Key principles:
• Camera orbit instructions go first. The model weights early tokens more heavily.
• Physical description is exhaustive. Name every feature: jawline shape, eye color, hair style, fabric textures, belt buckles. The model can only maintain what it was told about.
• Lock the subject still. “Character stands still in a neutral pose” prevents the model from adding motion.
• Simple background. Dark studio removes variables. Complex environments give the model more opportunities to hallucinate.
Step 4: Generate and Evaluate
Run the workflow and evaluate the output. You’re looking for:
• Character identity maintained through the full rotation
• Smooth camera movement without jumps
• No body parts appearing/disappearing
• Clothing consistent front-to-back
Success rate is roughly 75%+. If the generation doesn’t work, regenerate with the same prompt. The variance between runs is normal.
Step 5: Extract Frames
Once you have a good video, extract every frame:
mkdir frames
ffmpeg -i your_video.mp4 frames/frame_%04d.jpgFor an 8-second video at 24fps, you’ll get approximately 192 frames. More frames = smoother rotation.
Step 6: Build the Viewer
The interactive viewer is surprisingly simple. About 60 lines of JavaScript:
<!DOCTYPE html>
<html>
<head>
<style>
body { margin: 0; background: #000; display: flex;
justify-content: center; align-items: center;
height: 100vh; overflow: hidden; }
#viewer { cursor: grab; user-select: none; }
#viewer:active { cursor: grabbing; }
#viewer img { max-height: 100vh; max-width: 100vw; }
</style>
</head>
<body>
<div id="viewer"><img id="frame"></div>
<script>
const TOTAL = 192; // your frame count
const frames = [];
let cur = 0, startX, startF;
// Preload all frames
for (let i = 1; i <= TOTAL; i++) {
const img = new Image();
img.src = frames/frame_${String(i).padStart(4,'0')}.jpg;
frames.push(img);
}
const el = document.getElementById('frame');
const show = i => {
cur = ((i % TOTAL) + TOTAL) % TOTAL;
el.src = frames[cur].src;
};
show(0);
const viewer = document.getElementById('viewer');
viewer.addEventListener('pointerdown', e => {
startX = e.clientX; startF = cur;
viewer.setPointerCapture(e.pointerId);
});
viewer.addEventListener('pointermove', e => {
if (!e.buttons) return;
const delta = Math.round((e.clientX - startX) / 3);
show(startF - delta);
});
</script>
</body>
</html>Drop this HTML file alongside your frames/ directory and open it in a browser. Click and drag to rotate.
Step 7: Deploy
Upload the HTML file and frames/ directory to any static host: GitHub Pages, Netlify, Vercel, your own server, an S3 bucket. There’s no server-side component. The viewer is a self-contained static page.
Or use 360.cyfidesigns.com to upload your video and have the viewer built automatically (frame extraction, optimization, and hosted viewer included).
Troubleshooting
Face changes mid-rotation: Make the physical description more specific. Name individual features: jawline shape, nose bridge width, eyebrow arch, lip thickness. More anchor points = better consistency.
Character moves instead of camera orbiting: Strengthen camera instructions. Add “static pose, no body movement, camera moves not subject” to the prompt.
Background flickers between frames: Simplify the environment. Dark studio lighting removes variables.
Clothing changes between front and back view: Describe every garment with color, fit, fabric, and details. If you don’t describe it, the model invents it.
Rotation doesn’t complete 360 degrees: Generate at maximum duration. Chain two generations if needed. The LTXVAddGuideAdvanced node can anchor the start of a second generation to the last frame of the first.
Free vs. Paid Options
This tutorial covers the fully free, open-source approach using LTX Video 2.3. If you want a one-click cloud option, there’s also a paid version using Google Veo 3 that produces higher-resolution output with less prompt tuning, but requires a Google Cloud API account.
The LTX method gets you 90% of the way there for zero cost.
Resources
• Full tutorial with screenshots: 360.cyfidesigns.com/ltx-tutorial-preview
• Video walkthrough: youtu.be/r2F0UqNl0Pc
• Auto-builder tool: 360.cyfidesigns.com
• LTX Video repo: github.com/Lightricks/ComfyUI-LTXVideo
• Unsloth Q4_K_M model: huggingface.co/unsloth/LTX-2.3-GGUF
