Sign In

Disguise Drop - LTX-2 / Wan2.1 14B FLF2V

140

Verified:

SafeTensor

Type

LoRA

Stats

747

140

1.2K

Reviews

Published

May 19, 2025

Base Model

Wan Video 14B i2v 720p

Training

Epochs: 70

Usage Tips

Strength: 1

Hash

AutoV2
E824840D74
default creator card background decoration
kabachuha's Avatar

kabachuha

When the mask falls off… who’s really inside?

LTX-2 Disguise Drop!

Remake of my very first Wan lora now for LTX-2! The sound effects take it to a new level!

The LoRA brings the classic unzipping surprise cartoon effect where a character suddenly unzips their skin, revealing another character inside them all along!

This time - compared to my other LTX-2 loras so far - thanks to calibrated LR scheduling, it has virtually no grayness and you can use it in image2video and first-last-frame2video modes alike!

The workflows are embedded within the videos or you can use the .json file inside the Huggingface files folder at https://huggingface.co/kabachuha/ltx2-disguise-drop. It's the native workflow with KJ Nodes and VideoHelperSuite.

The training recipe is exactly the same as for my previous one-hand-squish lora, except for the lowered rank to prevent overfitting. This lora converged in 3520 steps.

The model was trained with Disguise drop. as the prefix, but you can simply describe the action in details.

Wan 2.1

This Wan2.1 First-last-frame LoRA aims to replicate the weird cartoonish transformation effect of full-body disguise, where it turns out that another person or a cartoon character has been inside the source person all along (or vice versa) when the first person unzips their skin.

This lora is tested to run in kijai's Wan Wrapper nodes + ComfyUI essentials for histogram correction of 0.2 (otherwise the frames will be too dark), with the default lora weight of 1.0.

The LoRA has been trained using diffusion-pipe for 70 epoch with variable flow shift: 4.3 (for ~ 40 epochs) -> 3.9 to capture both macroscopic changes (the falling skin) first and then the exact hand movements, using Prodigy optimizer. The train resolutions are 688x688x49 and 688x688x65.

As the text encoder is not trained, the keywords have little effect, and I recommend describing the action in details following the prompt-template below:

"""

The video begins with [object]. [She/He] touches the top of [her/his] head with [her/his] hand and then [she/he] fully unzips [her/his] skin in two halves slowly pulling the zip with [her/his] hand across [her/his] body from the top of [her/his] head, splitting [her/his] face, to the waist to reveal [target object] inside [her/him]. The [target object] then throws away the previous skin.

"""

The ending sentence variation is "The previous skin then falls down", if you don't want to toss it around. Also, you can have the same pictures at the beginning and the end resulting in an endless loop of unpeeling :)

This is an experimental lora and the success rate varies! The model sometimes tries to cheat using a "bag" to put the subject into for later unzipping. For the time being, add "bag, shell, case, hood, zipper, cloth, grainy, dark, movie shot, old movie, jungle, vegetation, plants" to the negative prompt. Obviously, the lora works the best when the subjects share the same or at least a similar background.