Text-to-image AI models need huge amounts of image data and image annotation to turn your text into an image. Specifically, it requires captioned pictures so the AI can learn how to process your request. It’s like asking someone to draw something in Pictionary. They recall what it looks like from experience and reproduce it on paper. The better the drawing, the easier it is to correctly identify.
The problem is huge quantities of text-to-image AI data are coming from the web. As we know there are some areas of the web which contain inappropriate and unsavoury material. Datasets tend to reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalized identity groups.
Better filtration can remove such content.
There may be replication of prevailing social biases and stereotypes. For example, an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes.
The output, therefore, is often racist, sexist, or toxic in some way. In early releases of the AI models, if you ask it to generate images of a “flight attendant”, almost all the subjects will be women. Ask for pictures of a “CEO” or even a “lawyer” and you see white men.
There are 7 ways to minimise bias: