Sign In

My WAN2.2 LoRa training workflow TLDR

19

My WAN2.2 LoRa training workflow TLDR

The basis for this workflow is my original WAN2.1 training guide: My WAN2.1 LoRa training workflow TLDR | Civitai

In this new article I will explain only the necessary differences between WAN2.2 and WAN2.1 training!

For everything else consult the old guide.

1. Dataset and Captions

No differences.

2. VastAI

New command:

git clone --recursive https://github.com/kohya-ss/musubi-tuner.git
cd musubi-tuner
git checkout feature-wan-2-2
git checkout d0a193061a23a51c90664282205d753605a641c1
apt install -y libcudnn8=8.9.7.29-1+cuda12.2 libcudnn8-dev=8.9.7.29-1+cuda12.2 --allow-change-held-packages
python3 -m venv venv
source venv/bin/activate
pip install -e .
pip install protobuf
pip install six
pip install torch==2.7.0 torchvision==0.22.0 xformers==0.0.30 --index-url https://download.pytorch.org/whl/cu128

Downloading the necessary models:

huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P models_t5_umt5-xxl-enc-bf16.pth --local-dir models/text_encoders
huggingface-cli download Comfy-Org/Wan_2.1_ComfyUI_repackaged split_files/vae/wan_2.1_vae.safetensors --local-dir models/vae
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp16.safetensors --local-dir models/diffusion_models
huggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp16.safetensors --local-dir models/diffusion_models

Everything else is again the same.

  1. Training

Everything the same except for the training command(s). You obviously need to train the LoRa on both the Low-noise and High-noise models separately (using the same dataset etc).

High-noise training command:

accelerate launch --num_cpu_threads_per_process 1 src/musubi_tuner/wan_train_network.py --task t2v-A14B --dit /workspace/musubi-tuner/models/diffusion_models/split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp16.safetensors --vae /workspace/musubi-tuner/models/vae/split_files/vae/wan_2.1_vae.safetensors --t5 /workspace/musubi-tuner/models/text_encoders/models_t5_umt5-xxl-enc-bf16.pth --dataset_config /workspace/musubi-tuner/dataset/dataset.toml --xformers --mixed_precision fp16 --fp8_base --optimizer_type adamw --learning_rate 3e-4 --gradient_checkpointing --gradient_accumulation_steps 1 --max_data_loader_n_workers 2 --network_module networks.lora_wan --network_dim 16 --network_alpha 16 --timestep_sampling shift --discrete_flow_shift 1.0 --max_train_epochs 100 --save_every_n_epochs 100 --seed 5 --optimizer_args weight_decay=0.1 --max_grad_norm 0 --lr_scheduler polynomial --lr_scheduler_power 8 --lr_scheduler_min_lr_ratio="5e-5" --output_dir /workspace/musubi-tuner/output --output_name WAN2.2-HighNoise_SmartphoneSnapshotPhotoReality_v3_by-AI_Characters --metadata_title WAN2.2-HighNoise_SmartphoneSnapshotPhotoReality_v3_by-AI_Characters --metadata_author AI_Characters --preserve_distribution_shape --min_timestep 875 --max_timestep 1000

Low-noise training command:

accelerate launch --num_cpu_threads_per_process 1 src/musubi_tuner/wan_train_network.py --task t2v-A14B --dit /workspace/musubi-tuner/models/diffusion_models/split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp16.safetensors --vae /workspace/musubi-tuner/models/vae/split_files/vae/wan_2.1_vae.safetensors --t5 /workspace/musubi-tuner/models/text_encoders/models_t5_umt5-xxl-enc-bf16.pth --dataset_config /workspace/musubi-tuner/dataset/dataset.toml --xformers --mixed_precision fp16 --fp8_base --optimizer_type adamw --learning_rate 3e-4 --gradient_checkpointing --gradient_accumulation_steps 1 --max_data_loader_n_workers 2 --network_module networks.lora_wan --network_dim 16 --network_alpha 16 --timestep_sampling shift --discrete_flow_shift 1.0 --max_train_epochs 100 --save_every_n_epochs 100 --seed 5 --optimizer_args weight_decay=0.1 --max_grad_norm 0 --lr_scheduler polynomial --lr_scheduler_power 8 --lr_scheduler_min_lr_ratio="5e-5" --output_dir /workspace/musubi-tuner/output --output_name WAN2.2-LowNoise_SmartphoneSnapshotPhotoReality_v3_by-AI_Characters --metadata_title WAN2.2-LowNoise_SmartphoneSnapshotPhotoReality_v3_by-AI_Characters --metadata_author AI_Characters --preserve_distribution_shape --min_timestep 0 --max_timestep 875

Only differences between the two are the model trained on and the timestep settings at the end. Everything else is the same.

Thats it. Nothing else needs to be changed. Works very well and gives much better results than WAn2.1.

Note that I also changed the training command(s) to do dim and alpha at 16 and learning rate power at 8 as opposed to my original guides recommendation of 32 and 4 respectively, as I find these values to work better, while also giving a model half the size.

I recommend using my inference workflow for WAN2.2 for the best results: https://www.dropbox.com/scl/fi/pfpzff7eyjcql0uetj1at/WAN2.2_recommended_default_text2image_inference_workflow_by_AI_Characters-v3.json?rlkey=nyu2rfsxxszf38phflacgiseg&st=xa2zwgb4&dl=1

If you want to support what I do, feel free to donate here: https://ko-fi.com/aicharacters

19

Comments