Musubi Tuner: Introduction and Workspace Setup
Part 2 of my series on training new motions for Wan Video in Musubi Tuner.
This guide accompanies my YouTube video on this subject, and should be of assistance for those that watched the video and simply want to copy settings etc.
This article is part of a series:
Fundamental concepts in LoRA Training (all models and training suites)
Musubi Tuner - organisation of workspace, datasets and commands (this article)
Captions
Configuration files
Training and monitoring with Tensorboard
Training outputs and inference testing
This part assumes you've read and gained an understanding of the basic concepts of LoRA training. With that done, let's get into the details of using Musubi Tuner to train your new motion.
What is Musubi Tuner?
Musubi Tuner is a suite of python libraries that allows you to fine tune or extend the training data of AI video models through the creation of motion or visual LoRAs. It has been developed by Kohya S. who also developed kohya_ss - a popular LoRA training suite for Stable Diffusion imaging.
There are a number of suites available for video training. I haven't used any except Musubi Tuner for Wan Video, so I can't comment on the comparison of ease of use and quality of results between this and the others.
Hardware Prerequisites
In terms of hardware it is recommended to have a minimum spec on your PC or hosted online environment. Opinion on this varies, but this is my take:
nVidia graphics card with at least 16GB RAM (24GB is ideal, 32GB allows you to go beyond)
64GB system RAM (128GB ideal)
Modern multicore CPU
The better the hardware, the more you can do, faster, such as longer sequences or higher resolution. So the hardware that you have won't be a barrier, but more of a limiter.
Installation
Installing Musubi is an involved process that will depend upon which OS you're using (Linux or Windows) and requires an understanding of the management of Python libraries. This guide is not intended as a substitute for the official docs or other exhaustive guides on this subject.
Briefly:
Create a new python virtual environment to ensure your training environment isn't affected by or affects other venvs you might be running, such as for ComfyUI
Clone the github repo or download the release and unzip to the venv directory
run pip install requirements to ensure you have all necessary dependencies
Ensure you have the appropriate CUDA version installed. This might not necessarily be the most recent version as it can be too far ahead for pyTorch to be stable, and all other libraries will depend upon that version of pyTorch
Python "dependency hell" is beyond the scope of this article! But as mentioned above, everything is "downstream" of your CUDA version - that will determine everything else, so start there and work through pyTorch and the other libraries that are downstream of that.
Workspace Organisation
Once you have your installation done and resolved all dependency errors, open up the base folder in Visual Studio Code or another IDE that will allow you to view and organise your files and folders.
I highly recommend you use a code/file viewer and organiser like VSCode as LoRA training involves managing lots of files for even a single LoRA, and as you train multiple different LoRAs, this problem will multiply significantly. You will drive yourself mad just trying to make do with a text editor.
I recommend that you create the following folders either under the musubi base folder, or in a separate folder:
captions - a central place to store your caption master files
configs - a place to keep your config files (one per LoRA)
projects - a place to keep your command logs with each LoRA and version
data - the base folder for your datasets (training videos)
custom_scripts - a folder to keep your own scripts such as helpers for captions, etc.
Creating these user folders under the base musubi folder makes it possible to use relative paths within commands but will trigger a detect change in github and mean you'll need to set them to ignore if you want to pull later updates from musubi's developer. If you use an external folder structure then you'll need to include full paths in scripts.
Ok, with your environment now set up and your workspace organised, we'll next talk about arranging your dataset, the set of video excerpts that will be the source of your motion training.
(Continued in Part 3)
If you found this article helpful then please consider supporting me at my RiotModels Page, and your reward is exclusive explicit video content of the Seven Sisters of Love.
