home models images videos 3D Models articles comics challenges updates shop

SCAIL-2 Two-Person Reference Editing Long-Video Workflow

Name: SCAIL-2 Two-Person Reference Editing Long-Video Workflow
Rating: 5 (5 reviews)
Author: AIKSK

107

Updated: Jun 18, 2026

character

Download

1 variant available

Config Other

SCAIL-2+双人参考编辑长视频.json

97.35 KB

Verified: a month ago

Download (97.35 KB)

This checkpoint includes a config file, download and place it along side the checkpoint.

Details

Type

Workflows

Stats

107

Reviews

Positive

(5)

Published

Jun 18, 2026

Base Model

Wan Video 2.2 T2V-A14B

Hash

AutoV2

975DF9C2D3

default creator card background decoration

#50

455

2.7K

AIKSK

Joined Mar 29, 2023

License:

Apache 2.0

Watch the full video first if you want to understand how this SCAIL-2 two-person reference editing workflow works in practice. The video shows how two reference characters can replace two selected people in a driving video, while keeping left-right identity alignment, synchronized motion, original scene structure, lighting coherence, and long-video continuity more stable.

This ComfyUI workflow is designed for SCAIL-2 two-person biological reference editing. Its main purpose is to take a two-person reference image and use it to replace both selected people in a two-person driving video. Unlike a simple two-person pose-driving workflow, this version is explicitly configured for character replacement. The reference image provides the two replacement identities, while the driving video provides the motion, timing, body interaction, camera rhythm, and original scene context.

The workflow is built around wan2.1_14B_SCAIL_2_fp8_scaled.safetensors as the main SCAIL-2 model. It also uses WAN VAE, UMT5 WAN text encoding, CLIP Vision, SAM3 tracking, SCAIL2ColoredMask, WanSCAILToVideo, SamplerCustom, VAEDecode, ForLoop continuation, overlap-frame trimming, ColorTransfer, final video combining, and original audio restoration. A multi-LoRA chain is preserved to improve motion quality, character stability, and final visual consistency.

The most important switch in this workflow is replacement_mode=true. This tells the SCAIL route to perform two-person skeleton guidance with reference character replacement. The positive prompt focuses on replacing both selected people with two reference characters, following two-person pose guidance, keeping left and right identity alignment, preserving the original scene structure, maintaining natural synchronized motion, coherent lighting, and smooth temporal consistency.

The negative prompt is also built for two-person failure cases. It suppresses bad video quality, flicker, only one person being replaced, missing second person, wrong identity order, identity swap, identity drift, deformed bodies, distorted faces, extra limbs, missing hands, blur, and low-quality output. These problems are especially common in two-person editing because the model has to preserve both bodies, both identities, and their interaction at the same time.

The workflow uses strict 512×896 alignment. Both the reference image and the driving video are resized to the same canvas before entering SAM3, CLIPVision, and SCAIL. This reduces tracking mismatch, mask instability, body distortion, and identity confusion.

SAM3 is configured with max_objects=2. SCAIL2ColoredMask uses object_indices=0,1 and sort_by=left_to_right. This is the core left-right identity rule: the left reference character is matched to the left tracked person, and the right reference character is matched to the right tracked person. This makes the workflow more reliable for two-person dance, duet action, couple shots, character interaction, double digital human edits, and multi-character AI video production.

The long-video structure follows the SCAIL-2 continuation system. The first segment is 65 frames and establishes identity mapping, pose guidance, mask relationship, and the initial replacement result. The continuation segment is 81 frames. Each loop removes 5 overlapping frames, so every loop effectively adds 76 new frames. The loop count is calculated as max(1, ceil((F - 65) / 76)), where F is the loaded driving video frame count.

The final output does not use an extra ImageCompositeMasked stage. The generated frames from the loop output are sent directly into the final video combine node. The original driving video audio is restored, and the frame rate is controlled by the unified FPS node, making the final result easier to match with the source rhythm.

Main features:

SCAIL-2 two-person reference editing workflow
Two reference characters replace two target people
Two-person skeleton-guided video editing
replacement_mode=true character replacement
512×896 unified input alignment
SAM3 max_objects=2 tracking
SCAIL2ColoredMask dual-person control
object_indices=0,1 target selection
sort_by=left_to_right identity alignment
CLIP Vision reference identity encoding
WanSCAILToVideo first-segment generation
65-frame initial segment
81-frame continuation segment
5-frame overlap trimming
ForLoop long-video continuation
Direct generated-frame final output
Original driving video audio restored
Unified 24fps output control
Multi-LoRA enhancement chain

Suggested workflow:

Prepare one clear two-person reference image and one clean two-person driving video. The reference image should show both characters clearly, with readable face, outfit, body shape, and left-right position. The driving video should contain two visible people with stable framing, readable motion, and limited occlusion. Keep the default 512×896 setting first. Check that SAM3 tracks two people correctly in both the reference and driving inputs. If identity order swaps, adjust the left-right layout or use a cleaner reference image. If only one person is replaced, check max_objects=2, object_indices=0,1, and replacement_mode=true. Run a short test first, then enable the long-video loop after both identities are stable.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2067141621002629121?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1jWL96nEpw/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2067141621002629121?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1jWL96nEpw/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。