Not sure if there is articles for this yet, but I'll add one to help.
I use diffusion-pipe; it currently only works on linux and WSL2. so download ubuntu on the windows store.
To create character loras I recommend having plenty of face shots. Be sure to CROP OUT text (ruined multiple loras because of this, like it becomes an Intro Vid with titles) and I usually crop it so it keeps just the character in the shot and have like 1 or 2 with other people (usually the opposite sex so it doesnt mix at all). Having extreme aspect ratios like 3:1 will be skipped though so be reasonable. You will have to change your dataset.toml max_ar to 2.5 to get those more extreme images recognized.
As long as photos are free from visible banding you should be fine since you'll be using 512x512 but upscaling is usually encouraged to train quality loras. 30 photos is plenty but you can use a lot more, ive gone up to 300, just means you'll need to train more steps.
I've been using florence2 just because its so fast, but have tried llama-vision and had similarly good results. When I start trying videos I'll switch to metas Apollo model, which has been working well for short vids.
I get good results from 500-1500 steps and set my max epochs to 30 so it will stop on its own. Learning rate bumped up to 2-e4. Training on 40 series card will take about 1 hour to 2 hours; a good portion of the training is caching the latents and will be much quicker with fewer photos.
Usually I'm at 19-20gb when doing 512 and get up to 23 when doing 1024.
Honestly this model is pretty easy to train, this one woman I've always have trouble has worked on each of her datasets which before I could only get 1 to work with pony and 2 in flux.
Start off small!
P.S. also realize that training characters rather than movements, might be worthless in coming months when they bring I2V
How to install diffusion-pipe? ---- Install ubuntu, you may need to go to bios and enable hypervisor. And use this guide