English Version
Background
I plan to train a character LoRA (Annie) for the Wan 2.2 video model, intended for animation production in a realistic 3D style.
Although I have never trained a LoRA before, today I successfully deployed DiffSynth-Studio and completed the official example training pipeline.
Now I’m ready to begin training my custom Annie LoRA, but I still have many questions regarding dataset construction and captioning.
My goal is for this Annie LoRA to be used similarly to prompts like:
“Annie is reading a book by a stream, while another character is fishing nearby.”
In the generated results, I hope:
Annie’s appearance remains consistent and accurate across generations.
Other characters do not inherit Annie’s visual features.
I. Training Dataset – Image-related Questions
1.1 Should the training samples include close-up face shots?
1.2 Should the training samples include upper-body shots?
1.3 Should the training samples include full-body shots?
1.4 Should the dataset include images with various facial expressions (e.g., crying, smiling, angry)?
1.5 Should back-view images be included? (I can provide them.)
1.6 Should 360° multi-angle views (front, back, left, right, etc.) be included? (I can provide them.)
1.7 Should the dataset include various poses (standing, sitting, crouching, running, jumping, etc.)? (I can provide them.)
1.8 Should it include different outfits (styles, colors, etc.)? (I can provide them.)
1.9 Should it include different hairstyles (long, short, various styles)? (I can provide them.)
1.10 Should images use solid-color backgrounds (pure white, gray, black, etc.)? (I can provide them.)
1.11 Should hats be avoided as much as possible in the training set?
1.12 Are there any other important suggestions for image data collection?
II. Training Dataset – Caption / Description-related Questions
2.1 Should the character name (“Annie”) always appear at the beginning of the caption?
2.2 Should captions describe facial details (eyes, mouth, nose, face shape, etc.)?
2.3 Should captions specify camera/view angles (wide shot, close-up, left, right, overhead, etc.)?
2.4 Should facial expressions be described?
2.5 Should poses or actions be described?
2.6 Should clothing (style, color, etc.) be described?
2.7 If the hairstyle is fixed, should it still be mentioned in the caption?
2.8 If the hairstyle is not fixed, should it be explicitly described?
2.9 If a hat appears, should it be clearly noted in the caption?
2.10 Should the background (solid color or scene) be described?
2.11 Should the art style (“realistic 3D”) be explicitly stated?
2.12 Should gender be mentioned?
2.13 Should age be specified?
2.14 Should body type/physique be described?
2.15 Are there any other important captioning recommendations?
2.16 To prevent other characters from inheriting Annie’s appearance, should captions emphasize her unique traits? (e.g., “Only Annie has red hair.”)
III. Training Dataset – Video Sample-related Questions
3.1 Should video clips be included (e.g., a 360° rotation around Annie)?
3.2 Should video clips with slow dolly-in/dolly-out camera movements (zooming toward or away from Annie) be included?
Thank you so much for any help—answering even just one of these questions would be greatly appreciated! 🙏
中文版
背景说明
我计划为 wan2.2 视频模型 训练一个角色 LoRA(安妮 / Annie),用于动画制作,写实 3D 风格。
此前我从未训练过 LoRA,但今天我已经成功部署了 DiffSynth-Studio,并完成了一个官方示例的训练流程。
现在我希望正式开始我的角色 LoRA 训练工作,但在数据集构建与标注方面仍有许多疑问。
我期望这个 安妮 LoRA 的使用方式类似于:
“安妮在小溪边看书,另一个人物在旁边钓鱼。”
我希望生成结果中:
安妮的外观始终正确、稳定
其他角色不会继承安妮的外观特征
一、训练数据集 —— 图片相关问题
1.1 训练样本中是否需要脸部特写图片?
1.2 训练样本中是否需要上半身图片?
1.3 训练样本中是否需要全身图片?
1.4 是否需要包含多种表情(哭、笑、怒等)的图片?
1.5 是否需要背面视角图片?(我可以提供)
1.6 是否需要上下左右 360° 全角度图片?(我可以提供)
1.7 是否需要多种姿势(蹲、坐、站、跑、跳等)?(我可以提供)
1.8 是否需要不同服装(款式、颜色等)?(我可以提供)
1.9 是否需要不同发型(长、短、不同款式)?(我可以提供)
1.10 是否需要不同纯色背景(纯白、纯灰、纯黑等)?(我可以提供)
1.11 训练集中是否应尽量避免帽子?
1.12 是否还有其他重要的图片数据补充建议?
二、训练数据集 —— 描述 / 标注(Caption)相关问题
2.1 训练描述中是否应将角色名(Annie)放在第一位?
2.2 是否需要描述五官细节(眼睛、嘴巴、鼻子、脸型等)?
2.3 是否需要描述镜头/视角(远景、近景、左、右、俯视等)?
2.4 是否需要描述表情?
2.5 是否需要描述姿势/动作?
2.6 是否需要描述服装(款式、颜色等)?
2.7 如果发型是固定的,是否仍需要在描述中标注发型?
2.8 如果发型不固定,是否需要在描述中标注发型?
2.9 如果出现帽子,是否应在描述中明确标注?
2.10 是否需要描述背景(纯色或场景)?
2.11 是否需要明确标注人物风格(写实 3D)?
2.12 是否需要描述性别?
2.13 是否需要描述年龄?
2.14 是否需要描述体型?
2.15 是否还有其他重要的描述补充建议?
2.16 为了避免其他角色继承安妮的外观,是否需要在描述中强化安妮的独有特征?(例如:只有安妮拥有红色头发)
三、训练数据集 —— 视频样本相关问题
3.1 是否需要视频样本(例如:围绕安妮 360° 旋转的视频)?
3.2 是否需要视频样本(例如:对安妮进行缓慢推近 / 拉远的镜头)?
非常感谢任何形式的帮助,哪怕只回答其中一项问题也非常感激 🙏
