It is a machine-translated article from Japanese to English, so we would appreciate it if you could point out any typos or errors.
There are no restrictions on translating or reproducing this article, and there is no need to report the reprint.
This article was written in March 2026, and the information contained herein will be outdated in a few months.
Prerequisites
This article explains how to training on a local PC using WAN2.2 I2V.
The training tool is "AI Toolkit," which is installed using "StabilityMatrix".
🚩If you only want to see the training settings, please refer only to ■Chapter 5■ (●Create JOB).
⚠️The content of this article may be updated periodically based on the author's experience.
■Chapter 1 - Physical Environment -■
WAN2.2 training has very demanding operating environment requirements.
As a practical minimum requirement, a CUDA-compatible GPU with 32GB or more of memory is required.
For local training, the minimum requirement is either an "Nvidia 5090 RTX (32GB)" or an "NVIDIA RTX PRO 4500 Blackwell (32GB)".

■Chapter 2 - Software Environment -■
"StabilityMatrix" is the ideal environment for trying out the latest AI tools.
Installing various packages is easy.
Updates are also easy.
However, for full-scale operation, it may be better to prepare an application environment separate from "StabilityMatrix" via GIT for various reasons.
⚠️For the most up-to-date installation guide, it might be best to consult an AI.
1. "StabilityMatrix" Installation Guide
GitHub: https://github.com/LykosAI/StabilityMatrix
Guide(JP): https://shichisan-blog.com/stability-matrix_dounyuu/
2. "AI-ToolKit" Installation Guide
GitHub: https://github.com/ostris/ai-toolkit
Guide(JP): https://note.com/aiaicreate/n/nd4f8e7d95efe
3. "HENTAI APP" Installation & User Manual Guide
When creating clip videos of datasets, I use the HENTAI APP.
GitHub: https://github.com/akitoshi1/HENTAI_APP

■Chapter 3 - HIGH or LOW -■
Conclusion first:
For simple motions, prioritize HIGH .
If detailed additional rendering is required, prioritize LOW.
Before starting the training, you need to reconfirm the special specifications of WAN2.2.
That is, the existence of "High noise" (hereafter "HIGH") and "Low noise" (hereafter "LOW").
WAN2.2 has gained very flexible video generation capabilities by dividing the model into HIGHand LOW.
However, users need to be very aware of HIGH and LOW, making it a very difficult specification.
Furthermore, for model trainers, this model division specification is truly crap, a towering pile of crap. Crap!
HIGH: Generates overall motion such as camera work and character movement motion.
LOW: Generates detailed rendering such as object details and additional objects.
Before training a model,
Is it a motion model that prioritizes HIGH ?
Is it a detailed rendering model that prioritizes LOW?
Please be sure to understand this before beginning to create your dataset.
This HIGH or LOW recognition will have a significant impact on the model's training results.
This article presents the following two models as examples.
[Explanation Sample] HIGH Model
Walking Rotation(WAN2.2 I2V 14B) Loop walking concept

[Explanation Sample] LOW Model
Drift stop from close-up face (WAN2.2 I2V) (ANIME) AKIRA in Kaneda's Bike Drift Stop

■Chapter 4 - Creating Dataset -■
Conclusion first:
If you have 32GB of GPU memory, prepare video clips with playback times of 3 seconds or less.
The starting point is of utmost importance; be sure to specify a starting point that matches the starting image specified in I2V.
HIGH model, 5 clips or less are sufficient.
LOW model, prepare 10 to 20 clips.
HIGH and LOW Dataset Sample
https://mega.nz/folder/1gRzUAoY#sYIcL4lPvPZWKaeeZ--VTw
HIGH Dataset Sample (4clip)

LOW Dataset Sample (21clip)

● How to Generate Clips
For the WAN2.2 dataset, you specify a video file.
There are two main approaches to preparing video files:
Extract clips from existing videos
Create clips using WAN2.2.
I handle each case individually.
For HIGH models that prioritize motion and camera work that cannot be reproduced with WAN2.2, I extract clips from existing videos using the HENTAI APP.
For LOW models that can be reproduced with WAN2.2 but require detailed rendering of the final frame, I generate clips by specifying the start and end images in WAN2.2.
● Clip Playback Time
Please try to keep the clip playback times as consistent as possible.
If there are variations of even a few seconds in playback time, it may have a significant impact on the model's quality.
If your GPU memory is 32GB, 4 seconds is the limit.
Even at 4 seconds, you should expect considerable degradation due to dropped frames.
Please try to unify the playback time to within 3 seconds whenever possible.
● Number of Clips
There is no definitive answer regarding the appropriate number of clips. The following are my guidelines based on experience.
For HIGH models: 1 to 5 clips should be sufficient. For truly simple models, even 1 clip is enough.
For LOW models: Prepare models with 10 to 20 clips. Increasing the number of clips doesn't seem to contribute much to diversity.
●Clip Content
Naturally, a dataset of disjointed clips will generate a disjointed model.
The starting point of the clip will be the same as the starting image of the I2V model.
When extracting clips from a video, carefully select a starting point that matches the starting image of the clip.
For the clip content, focus on making the parts you want to reproduce the same movement and rendering.
For parts you don't want to reproduce, make them as different as possible.
●Regarding video captions
Specify a common positive prompt when generating videos on WAN2.2.
If there are differences in the starting image, please specify this as a difference.
(e.g., Fullbody or Cowboy shot, Standing or Sitting)
■Chapter 5 - Training with AI-Toolkit -■
●Datasets
Place the dataset in the "Dataset" folder of AI-ToolKit.
In this example article, the path would be as follows
D:\StabilityMatrix\Packages\AI-Toolkit\datasets
Click "Datasets" in the menu on the left.
Verify that you can select the dataset you want to use for training in the list that appears on the right.

Check each video and its caption in the dataset.
You can edit captions and delete clips from here.

●New JOB
Click "New Job" in the menu on the left.
📘For a detailed explanation of each parameter, please refer to this article.
https://www.runcomfy.com/trainer/ai-toolkit/wan-2-2-i2v-14b-lora-training
The initial state will be displayed as follows:

🚩This article only describes the parameter settings for a local PC (GPU 32GB).
This section explains common items that should be changed from their default values.
1️⃣JOB

Training Name:
Please enter the model name to be output. (The names "_high_noise" and "_low_noise" will be assigned automatically.)
Trigger Word:
Please set a common prompt to specify when generating a WAN2.2 I2V connection.
2️⃣MODEL

Model Architecture:
Select "WAN 2.2 I2V (14B)"
Options-Low VRAM:

3️⃣Quantization

Transformer:
Select "4 bit with ARA"
4️⃣Multistage
Case of HIGH Model
Switch Every:
Set '10'

Case of LOW Model
Switch Every:
Set '35'

5️⃣Target

no changes.
6️⃣Save

no changes.
7️⃣Training
Case of HIGH Model
Steps:
Set '500'-'1000'
( I specify 500, and if that fails, I specify 1000. It might be wiser to specify 1000 from the start.)

Case of LOW Model
Steps:
Set '1500'-'2000'
( I specify 1500, and if that fails, I specify 2000. It might be wiser to specify 2000 from the start.)

Steps:
⏳500 steps: 3 hours...
⏳1500 steps: 10 hours ...
⏳2000 steps: 15 hours ...
⏳3000 steps: 24 hours ...
Time is the most expensive cost. Train smartly.
8️⃣DATASETS

Target Dataset:
Please select the dataset you wish to use for training.
Num Frames:
For clips of 1 second: 18
For clips of 2 seconds: 33
For clips of 3 seconds or more: 39-42
If you have 32GB of GPU memory, you can specify 42... however, depending on the number of clips, training may exceed the physical GPU memory.
For safety, I specify the maximum value as 39.
Resolutions:

9️⃣Sample

Advanced Sampling:

⚠️If Disable Sampling is not set to "ON", a fatal error will occur at the start of training.
Create Job
Once you have finished setting the parameters, please click "Create Job".

Click the "▶️" button in the upper right corner of the screen to start the training.

Once the progress bar shows "1," your training has begun.
Congratulations!!
Now all you have to do is wait!!

⚠️As of March 2026, a fatal error occurs when starting training in the initial installation state of AI-Toolkit🤪🤘.
・Error running job: Failed to import diffusers.schedulers.scheduling_dpmsolver_multistep because of the following error (look up to see its traceback):

Solution: The most reliable solution is to downgrade NumPy to the stable version 1.26.4.
1.Please select AI-Toolkit from the Packages tab in StabilityMatrix.

2.Click the "︙ (ellipsis)" in the upper right corner.

3.Open Python Packages and search for numpy in the list.

4.Change (downgrade) the version to 1.26.4 and apply.

・Error running job: The size of tensor a (36) must match the size of tensor b (16) at non-singleton dimension 1

Solution: Please set "Disable Sampling" in the sample to ON.

⬇️

I hope this annoying error is fixed as soon as possible.
■Chapter 6 - What is WAN2.2 Training? -■
WAN2.2 training requires an enormous amount of time.
⏳500 steps: 3 hours...
⏳1500 steps: 10 hours or more...
⏳2000 steps: 15 hours or more...
⏳3000 steps: 24 hours or more...
During this time, GPU utilization will be stuck at 100%, and with an NVIDIA 5090 RTX, GPU power consumption will be stuck at 550W.
Of course, the PC cannot be used for any other purpose during training.
What is gained from this?
WAN2.2 is a platform that is already becoming a legacy model.
WAN2.2 training skills will quickly become obsolete.
Nevertheless, WAN2.2 possesses a "freedom" that other platforms lack.
That freedom granted me unlimited and unrestricted video generation capabilities.
I believe Illustrious's image generation has granted you unlimited and unrestricted image generation capabilities.
Wouldn't you like to acquire unlimited and unrestricted video generation capabilities next?
■Finally■
Take Wednesday off from training.
Otherwise, Windows Update will ruin everything and drive you crazy...

全てのCivitAI関係者とユーザーに、心から敬意と感謝を表明します。
CivitAIが、愛と自由と平和と平等、HENTAIに溢れた最高にROCKな場所であり続けることを、心から応援しています。
thank you !!
(suteakasuteakasuteka434)

