home models images videos posts articles bounties challenges events updates shop

Squish - One Hand Only! - Wan2.2 i2v 14b

Name: Squish - One Hand Only! - Wan2.2 i2v 14b
Rating: 5 (79 reviews)
Author: kabachuha

760

Updated: Sep 14, 2025

action

hands jam squash surreal grab

Verified: 2 months ago

Diffusers

Details

Type	LoRA
Stats	760 0
Reviews	Very Positive (79)
Published	Sep 14, 2025
Base Model	Wan Video 2.2 I2V-A14B
Training	Epochs: 10
Usage Tips	Strength: 1
Trigger Words	sq41sh squish
Hash	AutoV2 00456E1249

1 File

default creator card background decoration

kabachuha

Squish - One Hand Only! - Wan2.2 i2v 14b

This LoRA allow you to squish objects, not only when brought to the camera, but at a distance too! Unlike vanilla squish, you can use a single hand alone without much prompting!

Prompt format and TL;DR

Prompt:

In the video, an [object] is presented. A person's long right arm appears from the side, stretches all way to the [object] and grabs the [object]. The [object] is held in the person’s right hand in the distance. The person then presses on the [object], causing a sq41sh squish effect. The person keeps pressing down on the [object], further showing the sq41sh squish effect.

My usage settings are straightforward: 3.5 CFG, shift 8.0 (default), dpm++-sde scheduler, 25 (13+12) steps. No speedup, or otherwise additional LoRAs have been used. Made with Kijai's wrapper.

I used uncond skip of the 10th block (SLG), to enhance the video quality a bit. No other enhancements have been applied for the gallery generations. It all works greatly on 512x512, 640p and 720p.

For details, see the workflows in the videos metadata or their duplicates on GitHub.

Now TL;DR. Below I present my struggles with this LoRA creation. I think, it may be useful for the fellow LoRA trainers and techy users in general.

The Problem

Why have I even been motivated to create this lora?

The vanilla Squish LoRA and it's synthetic distillations have a huge bias of two hands complete with their fingers enveloping the object from the sides. Additionally, the object is torn from it's spatial place and then moved closer to the camera, worsening the immersion. The cool idea, I though, would be if not only you can attract the object, but on contrary, reach the object with an arm firstly!

Making only one hand to appear is a nightmare of prompt-engineering, LoRA strength and conditioning arithmetic in Comfy – at least, it was for me – that's why I decided to create this fine-tune.

Methodology

Because of the bias, the "fix" dataset was impossible to get from raw objects. Then I changed the starting frame to hands standing nearby the objects, and the success rate increased. I though it was over, but no, the left hands intervened very often, bringing the object to the camera and basically undoing all the progress. Only after wasting multiple hours on rewriting promopts, adjusting LoRA's strength and averaging the conditionings with ComfyUI's coefficients, I found a suitable preset.

To compose the correcting dataset, made a collection of hands touching various items in various places, then retrieve the last frame and continue it with the calibrated Wan2.2's image2video and the sqush LoRA.

Before the concatenation, the "real" fragment was being put through Wan's VAE to increase the plausibility. At such resolution, a thin iridescent vertical line was seen at the right edge, and cut it manually through a homegrown "vibe-coded" program with the same preset resolution for all videos, so not to create a lot of aspect buckets.

The dataset was composed of:

10 hand-crafted squish situations (two part video: firstly the hand reaches the object + hand squishes the object), 2 - synthetic squishes, when a hand touches an object (lazy image to video, without making the "reaching" part).
Regularization: 18 curated high-quality RemadeAI-derived synthetic samples from Omni-VFX dataset.

As I said, the success rate has been abysmal, and the dataset was tiny.

The data mix consisted of 4 repeats of the manual dataset + 1 instance of regulatization, per epoch.

The fine-tuning of the squish LoRA took 6 epochs for high noise and 4 epochs for low noise, one night on dual-GPU: 5090 + 4090 (data parallel with diffusion-pipe).

The training was conducted with Prodigy as the optimizer, and pseudo-Huber loss with the Huber constant of 0.5 as the loss function, to experiment with this new setting.

Usage settings

The usage settings are straightforward: 3.5 CFG, shift 8.0 (default), dpm++-sde scheduler, 25 (13+12) steps. No speedup, or otherwise additional LoRAs have been used.

I used uncond skip of the 10th block (SLG), to enhance the video quality a bit. No other enhancements have been applied for the gallery generations.

The training resolution is < 512 max width/height, however, now with the bias removal, you can try going longer. 512, 640 are working, and you can get results even at 720p! (because the lora is trained too much, ha-ha!) The gallery examples are at 720p, to show off :0

The number of training frames is 81 (~5 seconds at 16fps, or ~3 on 24fps), and this number is recommended, but it also works at 65 frames, other counts not tested.

I used Kijai's ComfyUI wrapper, native workflows may need slight adjustments. The workflows are attached as video metadata. If they are hard to get, they are at this Github Gist link in the json form.

The gallery starting images were produced with Qwen, SDXL and Flux – Edit and Kontext included.

Limitations

The training was made for a right hand to remove the bias faster, so it may not suit you if you are a lefty :)

Due to it being trained on mostly synthetic data, you can encounter a thin iridescent line at the right edge of the frame. I made all effort to crop it when preparing the dataset, but a trace still appears somewhat. (If you look closer, this trace can appear even on your LoRA-less generations, due to Wan2.2's nature) The trace vanishes at higher resolutions.

When I was making the dataset, it was virtually impossible to get the prompt fit right at higher dimensions, even using the contraptions listed above, and I had to make the training at low res (< 512 max dim). It may result in minor blurriness.

Just like with vanilla Squish, occasionally the legs disappear from the subjects. Although, I tried my best to remove the cases from the regularization dataset, this effect still happens, and I don't know why. :( Maybe, it's Wan training limitation. (try describing the legs and the footwear in the prompts)

If the object is too close to the camera, it can trigger the stock behavior and spawn the second hand, so it's best to work with visibly remote objects.

Ending words

This LoRA was made to get rid of Squish's two hands bias and to allow smashing items at a distance for more realism and shock effect. It was created in hopes of being fun and useful.

The best you can do for me, is to share some examples of how it is working for you in the gallery, whatever rating they are :)

Of course, if you encounter a problem, leave a comment.

Credits to RemadeAI for the initial concept :3