home models images videos posts articles bounties challenges events updates shop

KPOP Idol turn

Name: KPOP Idol turn
Rating: 5 (100 reviews)
Author: ZXTOPOWER

100

923

Updated: Jul 30, 2025

concept

kpop dance

Verified: 12 days ago

SafeTensor

Details

Type	LoRA
Stats	310 0 14
Reviews	Positive (18)
Published	Jul 19, 2025
Base Model	Wan Video 14B t2v
Training	Steps: 4,655 Epochs: 133
Usage Tips	Clip Skip: 1 Strength: 1
Trigger Words	doing the zxtp_1y7urn move
Training Images	Download
Hash	AutoV2 D81885037B

2 Files

About this version

default creator card background decoration

ZXTOPOWER

2025-07-21

Alright, v2.0 is finally here!

After releasing v1.0, my first public LoRA, I honestly had no idea this much testing was needed just to get one of these out the door. Haha...

This whole process has given me a newfound respect for everyone who makes and shares their LoRAs. The amount of time and effort is just incredible, and I've come to sincerely respect them.

In my notes for v1.0, I mentioned that I got the best results by training on full-length video clips at once. I stuck with that method here, but it looks like it created a new little problem: if you try to generate a video longer than the training data (e.g., more than the original 2 seconds or 32(+1) frames), the motion gets skipped. This seems to be because the model is learning the entire motion at once without any breaks. I thought about trying to fix this by slicing the motion into different sections, but I'd already sunk so much time into this version that I decided to save that technique for a future concept.

Another change from v1.0 is that I didn't do the block weight adjustments this time. I realized that while those tweaks worked great in my personal workflow, they weren't a one-size-fits-all solution and didn't always work well in every environment.

So, while v2.0 is still far from perfect, I think it's the best one from the current training batch. I'm already thinking about a future version with even better motion, but no promises on when that'll be ready!

Quick heads-up: I didn't get to fully test this with img2vid. 🙏

It tends to work best on images where the full body is visible, just like in the training data. If parts of the body are covered up or occluded, the motion might get a little wonky.

2025-07-19

Hey everyone!

I'm working on v2 now.

And I'm using all the awesome work you've shared in the gallery as a reference!

Honestly, I had no idea so many people would be interested in this concept!

Hope to release it soon, thanks!

2025-07-15

Hotfix v1.1 - Re-added important weights that were excluded in the previous version.

I2V isn't quite working. My LoRa is super picky with reference images. I'll test more and get it right in the next version. SORRY!🙌

Recommendations

Workflow: Kijai - WanVideoWrapper T2V, FusionX

Utility LoRA: Wan Self Forcing Rank 16 (Accelerator) at Strength: 1.0

Sampler: LCM, flowmatch_distill, Unipc

Recommended LoRA Strength: 0.8 - 1.2

👉(The training data includes a workflow)

About This LoRA

This is my first public LoRA release.

The sample videos were generated using the workflow by Kijai. ~~I have also confirmed that this LoRA works with Wan 2.1 14b I2V models.~~

The trigger word is is doing the zxtp_1y7urn move. However, I've noticed that the motion can be activated simply by increasing the LoRA strength, even without the trigger word. To be honest, I'm not entirely sure why this happens! :p

Training Details

Dataset: The model was trained on 10 video clips, each 2 seconds long at 16 fps with a resolution of 576x1024.

Captioning Process: Captions were generated in a two-stage process. First, 64 frames were extracted from each video for initial captioning with the Qwen-VL-7B model. These captions were then rewritten and refined by the Qwen-32B model.

Methodology: I experimented with various methods to train a LoRA for continuous motion like dancing. I found that training on full-length video clips at once yielded the best results. Furthermore, it seemed more effective to omit detailed motion descriptions in the captions and let the model infer the action itself.

Overfitting Issue: A side effect of this approach is that the model learned the entire content of the videos, leading to significant overfitting. I plan to conduct further research into more effective captioning strategies to address this.

Post-processing & Limitations: After training, I adjusted the block weights to minimize the inclusion of unnecessary details. This significantly reduced image degradation even at higher strengths. However, the problem of the model directly replicating the training data remains. As mentioned, the LoRA is heavily biased towards the dataset, so prompts for attributes like clothing or gender will not be very effective.

I welcome all feedback and constructive criticism on my LoRA.