Makeup slider [Wan 2.1 T2V-14B]

https://civitai.com/models/2259797/makeup-slider
Model: Wan 2.1 t2v 14B
The strength with which the generation was performed is indicated on the video.
Ai-toolkit was used for training.

Slider lora

If you are tormented by the question "How to train slider lora for Wan video?" at the moment, I have found two possible options:

1. Using differential_lora.py from seruva19

2. Using the ai-toolkit

Differential lora

I initially started with this approach. It requires a special dataset with pairs of images to subsequently train two lora models. In my case, with and without makeup.

The task of adding or removing makeup is not new, and there are many open academic datasets with similar data. They are suitable for local experiments, but not for creating a model for the civitai website, as this would violate the policy.

But I will say this: based on my experiments, no matter what dataset I used and what parameters I trained with, the resulting slider lora was terrible. The lora did everything but change the makeup. And even if I did find a lora model that worked on one example, as soon as I changed the prompt or seed, everything broke. I even tried retraining two loras models on the same pair of images, but even that didn't help... So I had to look for another tool to train the slider lora.

Ai-toolkit

I initially learned about this tool from the description of this lora. And it does indeed have a special mode for the "Consept slider". A significant advantage is that you don't need to assemble a dataset from image pairs. Theoretically, you could even generate a random dataset during lora training.

I experimented with two synthetic datasets.

The first was created entirely based on thispersondoesnotexist, but I wasn't very happy with the result. The lora worked, but if the raw data already contained heavy makeup, it couldn't be removed with a negative lora strength value. I have a theory that this is because the original synthetic dataset almost never included people with very heavy makeup.

I generated the second dataset myself using Wan. I used the well-known Wan workflow. In total, I generated 400 images, based on 200 prompts with makeup descriptions and 200 without. Initially, I thought I'd throw out more than half the generations due to Wan artifacts, but surprisingly, there weren't any. So, I used everything available for training. I'm attaching the dataset to the article in attachments.

Training parameters

resolution: 256
rank: 8
steps: 60
lr: 0.00005
positive_prompt: "Dramatic smoky or colorful eyeshadow, bold lipstick (red, purple, dark), heavy eyeliner, false lashes, high color contrast"
negative_prompt: "Natural skin, no eyeshadow, nude or no lipstick, minimal enhancement"
target_class: "person, makeup"
anchor_class: ""

I don't see much point in training the slider lora at high resolution, since it doesn't learn anything new. 256 resolution was perfectly fine for me. I had to increase the rank from 4 to 8 because at 4, the lora couldn't seem to separate makeup from the face and had difficulty removing bright lips with negative strength. Otherwise, everything is pretty standard. This video explains in detail what these prompts do.

Conclusion

The lore effect is largely dependent on the makeup used in the initial generation. If the makeup is initially quite bright, increasing the lore strength will create a drag queen makeup effect. The lora itself also distorts the face, changing the shape of the face and accessories. I tried adding information about the environment and accessories to the anchor_class, but this worsened the results, making the makeup too exaggerated. I got the best results with an empty field. I recommend using values between -3 and 3; this range produces the most realistic makeup.

Makeup slider [Wan 2.1 t2v-14B]