The basis for this workflow is my original WAN2.1 training guide: My WAN2.1 LoRa training workflow TLDR | Civitai
In this new article I will explain only the necessary differences between WAN2.2 and WAN2.1 training!
For everything else consult the old guide.
1. Dataset and Captions
No differences.
2. VastAI
New command:
git clone --recursive https://github.com/kohya-ss/musubi-tuner.gitcd musubi-tunergit checkout feature-wan-2-2git checkout d0a193061a23a51c90664282205d753605a641c1apt install -y libcudnn8=8.9.7.29-1+cuda12.2 libcudnn8-dev=8.9.7.29-1+cuda12.2 --allow-change-held-packagespython3 -m venv venvsource venv/bin/activatepip install -e .pip install protobufpip install sixpip install torch==2.7.0 torchvision==0.22.0 xformers==0.0.30 --index-url https://download.pytorch.org/whl/cu128Downloading the necessary models:
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P models_t5_umt5-xxl-enc-bf16.pth --local-dir models/text_encodershuggingface-cli download Comfy-Org/Wan_2.1_ComfyUI_repackaged split_files/vae/wan_2.1_vae.safetensors --local-dir models/vaehuggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp16.safetensors --local-dir models/diffusion_modelshuggingface-cli download Comfy-Org/Wan_2.2_ComfyUI_Repackaged split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp16.safetensors --local-dir models/diffusion_modelsEverything else is again the same.
Training
Everything the same except for the training command(s). You obviously need to train the LoRa on both the Low-noise and High-noise models separately (using the same dataset etc).
High-noise training command:
accelerate launch --num_cpu_threads_per_process 1 src/musubi_tuner/wan_train_network.py --task t2v-A14B --dit /workspace/musubi-tuner/models/diffusion_models/split_files/diffusion_models/wan2.2_t2v_high_noise_14B_fp16.safetensors --vae /workspace/musubi-tuner/models/vae/split_files/vae/wan_2.1_vae.safetensors --t5 /workspace/musubi-tuner/models/text_encoders/models_t5_umt5-xxl-enc-bf16.pth --dataset_config /workspace/musubi-tuner/dataset/dataset.toml --xformers --mixed_precision fp16 --fp8_base --optimizer_type adamw --learning_rate 3e-4 --gradient_checkpointing --gradient_accumulation_steps 1 --max_data_loader_n_workers 2 --network_module networks.lora_wan --network_dim 16 --network_alpha 16 --timestep_sampling shift --discrete_flow_shift 1.0 --max_train_epochs 100 --save_every_n_epochs 100 --seed 5 --optimizer_args weight_decay=0.1 --max_grad_norm 0 --lr_scheduler polynomial --lr_scheduler_power 8 --lr_scheduler_min_lr_ratio="5e-5" --output_dir /workspace/musubi-tuner/output --output_name WAN2.2-HighNoise_SmartphoneSnapshotPhotoReality_v3_by-AI_Characters --metadata_title WAN2.2-HighNoise_SmartphoneSnapshotPhotoReality_v3_by-AI_Characters --metadata_author AI_Characters --preserve_distribution_shape --min_timestep 875 --max_timestep 1000Low-noise training command:
accelerate launch --num_cpu_threads_per_process 1 src/musubi_tuner/wan_train_network.py --task t2v-A14B --dit /workspace/musubi-tuner/models/diffusion_models/split_files/diffusion_models/wan2.2_t2v_low_noise_14B_fp16.safetensors --vae /workspace/musubi-tuner/models/vae/split_files/vae/wan_2.1_vae.safetensors --t5 /workspace/musubi-tuner/models/text_encoders/models_t5_umt5-xxl-enc-bf16.pth --dataset_config /workspace/musubi-tuner/dataset/dataset.toml --xformers --mixed_precision fp16 --fp8_base --optimizer_type adamw --learning_rate 3e-4 --gradient_checkpointing --gradient_accumulation_steps 1 --max_data_loader_n_workers 2 --network_module networks.lora_wan --network_dim 16 --network_alpha 16 --timestep_sampling shift --discrete_flow_shift 1.0 --max_train_epochs 100 --save_every_n_epochs 100 --seed 5 --optimizer_args weight_decay=0.1 --max_grad_norm 0 --lr_scheduler polynomial --lr_scheduler_power 8 --lr_scheduler_min_lr_ratio="5e-5" --output_dir /workspace/musubi-tuner/output --output_name WAN2.2-LowNoise_SmartphoneSnapshotPhotoReality_v3_by-AI_Characters --metadata_title WAN2.2-LowNoise_SmartphoneSnapshotPhotoReality_v3_by-AI_Characters --metadata_author AI_Characters --preserve_distribution_shape --min_timestep 0 --max_timestep 875Only differences between the two are the model trained on and the timestep settings at the end. Everything else is the same.
Thats it. Nothing else needs to be changed. Works very well and gives much better results than WAn2.1.
Note that I also changed the training command(s) to do dim and alpha at 16 and learning rate power at 8 as opposed to my original guides recommendation of 32 and 4 respectively, as I find these values to work better, while also giving a model half the size.
I recommend using my inference workflow for WAN2.2 for the best results: https://www.dropbox.com/scl/fi/pfpzff7eyjcql0uetj1at/WAN2.2_recommended_default_text2image_inference_workflow_by_AI_Characters-v3.json?rlkey=nyu2rfsxxszf38phflacgiseg&st=xa2zwgb4&dl=1
If you want to support what I do, feel free to donate here: https://ko-fi.com/aicharacters


