Type | |
Stats | 68 0 |
Reviews | (7) |
Published | Mar 6, 2025 |
Base Model | |
Training | Steps: 2,500 Epochs: 10 |
Usage Tips | Clip Skip: 1 Strength: 1 |
Hash | AutoV2 59D565CC49 |
If it's helpful to you, you can follow my Bilibili account or YouTube account.
Multi role control consistency has always been a challenge in ComfyUI. Previously, single role LoRA training was used to control single role consistency,
Subsequently, I merged the dataset and annotated the prompt words. However, the selected prompt word annotation strategy had the problem of semantic pollution, so the model did not achieve the expected effect well.
label:
In the photo with a pure white background, susuxi stands with hands on hips on the left side of the picture, wearing a white shirt and black pants with a yellow belt on the pants and yellow pockets on the shirt. The expression is happy and the mouth is laughing. dreamoo is in the upper body photo on the right side of the picture, looking at the gray top and red short sleeved shirt on the right side
However, after testing, ignoring the region control of prompt words can lead to feature fusion in images.
susuxi and dreamoo are swinging on the swing,
Subsequently, inspired by in context lora, I modified the labeling method and increased the Flux model's perception ability of image regions. By using different prompt word forms to correspond to the features of different regions in the image, I completed the training of the lora model,
label:
[Two different characters scene], <dreamoo><ssx> group photo, <ssx stands with hands on hips, wearing a white shirt and black pants, with a yellow pocket on the shirt>, <dreamoo wears a gray shirt with a red inner layer, facing right>, pure white background,
I recommend using the following method to write prompt words and adding characters in the same scene in the final scene, which will better put different characters in the same scene.
[Two different character in one scenes], <dreamoo><ssx> Describe the scenario, <Use trigger words to describe the attire and status of character one>, <Use trigger words to describe the attire and status of character two>, Overall scene description,
The following is the test result chart. From the results, it can be seen that the model has strong generalization ability and can maintain consistency in its roles.
If combined with the Wanxiang video generation model, video production can be completed and the effect is very good.