WAN - training, loras, workflows, thoughts :)

Hello, my dear readers!

There is a lot of stuff so I'll try to make it brief! :)

1) My new WAN loras have just dropped at: https://huggingface.co/malcolmrey/wan/tree/main/wan2.1

There is like 130+ loras so have fun :) (40 GB)

2) Workflow for WAN 2.1 with all my loras listed in the powerlora:

https://huggingface.co/datasets/malcolmrey/workflows/tree/main/WAN

This is the workflow that I use for 2.1, it has all the available loras listed there.

There are also some Notes left to explain some parts a bit :)

3) How to train WAN 2.1 loras the way I do it?

Nowadays it is so simple that pretty much tutorials are very short and straightforward.

Currently I use the AI Toolkit:

ostris/ai-toolkit: The ultimate training toolkit for finetuning diffusion models

Most of the settings are default, but I did tweak some minor things for my convenience.

The way to quickly start a new training is to replace the configuration file and edit two values directly in it via the GUI, my configuration is attached to this article -> aitoolkit-config.txt

After you install AI Toolkit and run the GUI,

i) click on New Job

ii) click on Show Advanced

iii) you will see something like this below, replace all of it with the content from my file

replace name: "wan_NAMEYOURMODEL_v1" with your model name

and replace folder_path: "c:\\Development\\ai-toolkit\\datasets/DATASETFORYOURMODEL" with the folder of your dataset (you should have the folder prepared by now)

iv) then click Show Simple and confirm that all looks good

v) and then just click Create Job (i don't even click Show Simple anymore)

Few words about the training:

The most important thing is the dataset. Pretty much it is what makes or breaks the model.

After extensive tests and trainings I can confirm that the default learning rate is the way to go (I did experience others but they were hit or miss)

However, I nailed down what seems to be the sweet spot. 20 images in the dataset and 2500 steps.

The training is a function based on the amount of images and steps made. In simple terms it just iterates over the images X times till it reaches those 2500 steps in total. Is it simple 2500/20=125 per image? Maybe, I don't know. What I do know is that if you increase the amount of images - you would realisticly need to increase amount of steps as well. It is not linear so 40 images does not translate to 5000 steps.

I tried many variations and decided that 20 (up to 22-25, this is still fine, but it can also be 18 or so) is the best number for 2500 steps.

I had good results with many images but I had to go to 4000 steps. The results weren't better, they were just very good. But that was 1500 more steps to do, I don't think it is neccessary.

Pick the images where the face does really resemble the person. Sometimes you can have an image where the lighting or makeup or pose make the person less recognizable.

WAN is much better than anything in picking up on details. If you give it great images then the results will be great, if you put dubious images, the results will be that too.

There is no need for captions at all.

The variety in images is important, do not put all "red carpet" shots cause you will be limited in WANs imagination when you do basic prompts.

I've noticed that it is also beneficial to actually add lower quality images AS LONG as the face is recognizable very well. The imperfections are trained into the model. In my "red carpet" example - you would get mainly red carpet quality outputs without heavy prompting (which might be fine for some, but it might be better to add some screengrabs from some interviews/movies/candid shots).

As per usual (those who follow me at least should be familiar with it) -> you can train one person multiple times (using different dataset) and then use those different loras together in a prompt (see workflow for examples). Again, there are positives for that (better likeness)

4) What next?

I do need to clean up my version of WAN 2.2 workflow and make it available as well as WAN VACE.

and of course, training more loras :-)

If you have a particular priority request you can always drop it at my coffee page : Malcolm Reynolds is creating AI Models for Stable Diffusion

Other places where you can find me:

reddit:

http://reddit.com/r/malcolmrey

hugginface

malcolmrey (Malcolm Reynolds)

Cheers and have fun using the models, the training info, the loras and the workflow :)