What is AI Training Bias? Do AI models really have them? A simple picture can explain this concept better. Let us start with a super simple prompt.
Prompt: Hyperrealistic side view of two-college-student standing on floor: (Miguel), before (Shilpa).
CFG: 100% (could be anywhere from 85% to 100%)
Woah! what just happened here? First all of the picture is not exactly a realistic photograph material. But this is not point. The real point is that in my prompt I said "(Miguel), before (Shilpa)". Specifically I did not say their genders or ethnicity or even what they should be wearing. How did the AI model then knew to draw a Hispanic male student standing before and Indian female student? That is because the AI knows that "Miguel" is the most common name of a Hispanic male and "Shilpa", is the common name of a North Indian female. Then because the prompt said "college-student", the AI concluded that the two individuals are college aged young adults. By the way, the reason why floor is empty and not filled with many students is because the prompt said "two-college-student" and not "two college students", which would have made the AI draw many students.
This is all good. We were able to describe scene very cleverly using a short prompt. But where is the training bias in this? Actually there already some training bias influencing that picture. If you notice carefully, Shilpa's hair is colored. She is wearing a short top and tight jeans. All typical of a north Indian female of her age. Let us change the prompt slightly different this time to show bias even more clearly.
Prompt: Hyperrealistic side view of two-college-student kneeling on floor: (Rajesh), before (Gayathri).
CFG: 90% (This change doesn't really for this discussion.)
Again not the best image. But this time you will start to notice some changes in the scene. Gayathri, which is a common name for a traditional, conservative Hindu Iyengar girl, is wearing a bit conservative dress. Her face looks more like a brahmin girl and skin tone that common to south Indians. Also Rajesh is wearing more an Indian young adult style dress. Now we are starting to see AI training bias. AI models must have been trained with images of characters with similar skin, tone, facial features and dressing style. But this still not definitive yet. So let us try another similar prompt.
Prompt: Hyperrealistic side view of two-college-student kneeling on floor: (Salim), before (Gayathri).
This is a very good example of AI training bias. Just because I changed the name of the male student to "Salim", which is a common name for a muslim man, now the scene changed to one that of a prayer. Salim is seen praying like a typical muslim instead of kneeling. And just because the scene included a muslim, even Gayathri who is an Iyengar brahmin is also praying in muslim style and that too on prayer mat. Not only that! She is now looking like a Muslim girl. Her foot wear is gone. I guess she got converted into a Muslim🤣. You will also notice similar shifts in the background scene in each one of these. First scene had a simple modern college style windows. Then the scene started to look more like south Indian college floor. But this scene is definitely showing cultural elements of a muslim college building like the wooden bench, the wooden doors and window, etc.
This article not only proves that the AI models have training bias, but it is something we can use to our advantage. For example instead of saying "(a man, Hispanic, muscular, wearing traditional hispanic cloth), before (a girl, North Indian, svelte, wearing modern attire)", we can simply say "(Miguel), before (Shilpa).
Finally, let us try to describe a complex scene of photoshoot of seven hockey players.
Prompt: (4K, Hyperrealistic, highly-detailed, professional-group-photoshoot, seven-college-hockey-player). standing on floor from left to right:(Viktor-Tikhonov), (Gayathri-Surivirala), (Marcus-Foligno), (Shilpa-Shetty), (Dinesh-Menon), (Sarah-William), (Antonio-Pierce).
You will notice that the prompt is surprisingly short and powerful for a tightly controlled scene with exactly seven players, three females and four males, but the order is not the same as mentioned and ethnicity is mostly right. This is because CFG of 90% gives the model some wiggle room, plus with many characters in the scene, there will be some bleed of attributes, especially if some names are more definitive than others. Also the overall quality of the picture is poor despite the prompts and settings that should have yielded a rich photograph. This is a clear indication that the AI is not fully sure about what it must focus on. This is where inline prompt weights can help, which will another article for us to explore.
What about SD 1.5?
Added this section after seeing some questions on reddit about using this technique in SD 1.5 based models. Yes, this technique using names for characters will work on SD 1.5 based checkpoints as well. But, the prompting style should be different. Here is a simplified example:
Prompt: Hyperrealistic side-view of three-college-student-at-park: Miguel, Henrik, Hiroshi.
Cfg scale: 26
Clip skip: 2
Base model: SD1
I have noticed that avoiding plurals is better with almost all checkpoints, unless you really want many. Also SD 1.5 works better when the related keywords are joined with hyphens. So "side-view" and "three-college-student-at-park" will work better that "side view of three college students in a park".
Brackets without different inline weights in SD 1.5 will force the checkpoint to attempt to give equal weight to all, which may result in gender/ethnicity attributes bleeding from strongly indicative names to the weaker ones, especially when the CFG is high. So I did not bracket character names.
More importantly the character names should give clear gender/ethnicity attributes to the AI model. I asked Chat GPT to suggest some good ones here.
Dream Shaper 8 (same prompt, same settings)
Hope this is helpful. Have you noticed other similar training biases? Do you have other tips, suggestions or questions, please drop a comment below.