This is a bare guide to using musubi-tuner to train a Wan2.2 LoRA with both bases/experts at the same time, in just one training session, producing just one LoRA that works with both bases. It is proven to produce excellent LoRAs on a 12gb 3060 with conservative settings.
This should get you started.
Installing musubi is straightforward, just follow the instructions at the repo. Prepping data is the same as any other video LoRA training. The speed and efficiency of your training will be affected by your virtual environment's parameters, so set up the best venv you can with your hardware.
musubi-tuner - https://github.com/kohya-ss/musubi-tuner
venv perks - https://civitai.com/articles/9555/installing-triton-and-sage-attention-flash-attention-and-x-formers-win
helpful video data prep - https://github.com/lovisdotio/VidTrainPrep
models - https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files/diffusion_models
umt5 and VAE - https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/tree/main
Notes:
- images are fine for facial likeness, but for motion/action, videos are likely necessary
- do not use block swap, as it is incompatible with the offloading option, which is necessary to prevent OOMs on low VRAM systems
- these settings are below baseline, adjust up for your setup. These settings should work for 12gb cards, so tweak as necessary for your needs.
- DIM/ALPHA at 24/24 worked for me, you may want bigger LoRAs, so tweak if you prefer. 24/24 yields a 225mb LoRA that seems to work flawlessly.
(EDIT: I am now using 16/16 for facial likeness and it is the same as 20/20 or 24/24. Also, LR = 0.0001 seems to work just fine, but any higher you risk inhibiting motion and introducing artifacts.)
Here is a basic .bat for training:
@echo off
setlocal enabledelayedexpansion
set "WAN=C:\Users\user\musubi-tuner"
set "CFG=C:\Users\user\musubi-tuner_GUI_first\my tools and samples\configs\my-lora-wan.toml"
set "DIT_LOW=D:\Models\diffusion_models\Wan\wan2.2_t2v_low_noise_14B_fp16.safetensors"
set "DIT_HIGH=D:\Models\diffusion_models\Wan\wan2.2_t2v_high_noise_14B_fp16.safetensors"
set "VAE=D:\Models\VAE\Wan\Wan2.1_VAE.pth"
set "T5=D:\Models\clip\models_t5_umt5-xxl-enc-bf16.pth"
set "OUT=C:\Users\user\musubi-tuner\outputs\my-lora-2.2-dual"
set "OUTNAME=my-lora-2.2-dual"
set "LOGDIR=C:\Users\user\musubi-tuner\logs"
set "CUDA_VISIBLE_DEVICES=0"
cd /d "%WAN%"
call venv\scripts\activate
accelerate launch --num_processes 1 --mixed_precision fp16 ^
"wan_train_network.py" ^
--dataset_config "%CFG%" ^
--discrete_flow_shift 3 ^
--dit "%DIT_LOW%" ^
--dit_high_noise "%DIT_HIGH%" ^
--fp8_base ^
--fp8_scaled ^
--fp8_t5 ^
--gradient_accumulation_steps 1 ^
--gradient_checkpointing ^
--img_in_txt_in_offloading ^
--learning_rate 0.000025 ^
--log_with tensorboard ^
--logging_dir "%LOGDIR%" ^
--lr_scheduler cosine ^
--lr_warmup_steps 150 ^
--max_data_loader_n_workers 2 ^
--max_timestep 1000 ^
--max_train_epochs 40 ^
--min_timestep 0 ^
--mixed_precision fp16 ^
--network_alpha 20 ^
--network_args "verbose=True" "exclude_patterns=[]" ^
--network_dim 20 ^
--network_module networks.lora_wan ^
--offload_inactive_dit ^
--optimizer_type AdamW8bit ^
--output_dir "%OUT%" ^
--output_name "%OUTNAME%" ^
--persistent_data_loader_workers ^
--save_every_n_epochs 1 ^
--seed 42 ^
--t5 "%T5%" ^
--task t2v-A14B ^
--timestep_boundary 875 ^
--timestep_sampling logsnr ^
--vae "%VAE%" ^
--vae_cache_cpu ^
--vae_dtype float16 ^
--sdpa
pause
swap your parameters and save as .bat and just double-click to launch training
-------------------------------------------------------
and here is a sample toml for a tiny dataset of small images for low vram training:
[general]
caption_extension = ".txt"
batch_size = 1
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
image_directory = "path/to/your/dataset"
cache_directory = "path/to/your/dataset_cache_folder"
num_repeats = 1
resolution = [256,256]
adjust as necessary and save as .toml
---------------------------------------
Here is a .bat to cache latents:
@echo off
cd path\to\musubi-tuner
call venv\scripts\activate
set CUDA_VISIBLE_DEVICES=0
python src\musubi_tuner\wan_cache_latents.py --dataset_config path\to\toml --device cuda --num_workers 4 --vae path\to\VAE --batch_size 4 --vae_cache_cpu
rem --skip_existing --keep_cache
pause
and a .bat to cache text encoder outputs:
@echo off
cd path\to\musubi-tuner
call venv\scripts\activate
set CUDA_VISIBLE_DEVICES=0
python src\musubi_tuner\wan_cache_text_encoder_outputs.py --dataset_config path\to\toml --device cuda --num_workers 4 --t5 path\to\t5 --batch_size 4 --fp8_t5
rem --dataset_config DATASET_CONFIG [--device DEVICE] [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--skip_existing] [--keep_cache] --t5 T5 [--fp8_t5]
pause
Let me know if I've omitted anything crucial.
Edit:
The formatting is not working for me in my browser so to @LocalOptima - I'm so sowwy you can't read.
As for it being a re-post of the readme?
Dunno what you're smoking, but your attitude in general is that of a literal baby.
Harassing me on my profile after blocking me is... sheisty. Just like you.
I read all the documents when they were published and tested the training scripts on my data and my cards, and I wrote the executable bats myself for ease of use.
These configs are not available on github, and neither are the bats.
You whinging about formatting is hilarious considering your tantrum over being asked to upload safetensors instead of zips.
Others seem to have the grace you lack:
https://i.imgur.com/lLN8j9z.png
https://i.imgur.com/10ycNow.png
The real issue with that chat was you not understanding the request in the first place, because you lack context because you have not used a breadth of tools and apparently possess only a few dozen LoRAs which you manually curate by hand.
Pitching a fit and attacking me and hyper-fixating on my mixing two types of metadata as the crux of the whole issue was all extra and all you.