This blog post was originally posted on the Microsoft AI Blog.
Microsoft announced Tuesday that it has updated its facial recognition technology with significant improvements in the system’s ability to recognize gender across skin tones.
That improvement addresses recent concerns that commercially available facial recognition technologies more accurately recognized gender of people with lighter skin tones than darker skin tones, and that they performed best on males with lighter skin and worst on females with darker skin.
With the new improvements, Microsoft said it was able to reduce the error rates for men and women with darker skin by up to 20 times. For all women, the company said the error rates were reduced by nine times. Overall, the company said that, with these improvements, they were able to significantly reduce accuracy differences across the demographics.
The higher error rates on females with darker skin highlights an industrywide challenge: Artificial intelligence technologies are only as good as the data used to train them. If a facial recognition system is to perform well across all people, the training dataset needs to represent a diversity of skin tones as well as factors such as hairstyle, jewelry and eyewear.
The team responsible for the development of facial recognition technology at Microsoft, which is available to customers as the Face API via Azure Cognitive Services, worked with experts on bias and fairness across Microsoft to improve a system called the gender classifier, focusing specifically on getting better results for all skin tones.
The Face API team made three major changes. They expanded and revised training and benchmark datasets, launched new data collection efforts to further improve the training data by focusing specifically on skin tone, gender and age, and improved the classifier to produce higher precision results.
“We had conversations about different ways to detect bias and operationalize fairness. We talked about data collection efforts to diversify the training data. We talked about different strategies to internally test our systems before we deploy them,” said Hanna Wallach, a senior researcher in Microsoft’s New York research lab and an expert on fairness, accountability and transparency in AI systems.
Wallach and her colleagues provided “a more nuanced understanding of bias,” said Cornelia Carapcea, a principal program manager on the Cognitive Services team, and helped her team create a more robust dataset “that held us accountable across skin tones.”
Beyond technical challenges
Ece Kamar is a senior researcher in Microsoft’s research lab in Redmond, Washington. Her research focuses on AI tools that help engineers identify blind spots in training data, such as the underrepresentation of darker skinned women that may lead to AI systems with unacceptable error rates on gender classification tasks.
She said that improving the performance of the gender classifier in the Face API was mainly a technical challenge. “Collecting more data that captures the diversity of our world and being careful about how to measure performance are important steps toward mitigating these issues,” she said.
A more nuanced challenge, she said, is learning how, and when, to go in and mitigate AI systems that reflect and amplify societal biases not because of dataset incompleteness or algorithmic inadequacies, but because human societies are biased.
“If we are training machine learning systems to mimic decisions made in a biased society, using data generated by that society, then those systems will necessarily reproduce its biases,” explained Wallach.
Wallach’s team is developing best practices for the detection and mitigation of bias and unfairness along the entire development pipeline of these types of AI-powered products and services, from idea creation and data collection to model training, deployment and monitoring.
“This is an opportunity to really think about what values we are reflecting in our systems,” said Wallach, “and whether they are the values we want to be reflecting in our systems.”
Searching to find answers
For example, if you do a web search for the word “CEO,” chances are you’ll get information about the senior leadership position in companies and organizations around the world, including a handful of images – most likely of men.
That’s not surprising, since less than 5 percent of Fortune 500 CEOs are women.
At Microsoft, that’s prompted an ongoing exploration among engineers at the search engine Bing, in collaboration with experts such as Wallach, that is looking at how best to also surface results that reflect the active discussion in boardrooms, throughout academia and on social media about the dearth of female CEOs – and the efforts to change that.
“When we think about this kind of bias, we start to think about how do we educate our users on the bias, how do we explain to users what is going on in society and how do we potentially help users explore those things,” said Michael Golebiewski, a principal program manager on the Bing team.
“It is an area where we don’t have all the answers yet. It is an area where we are starting to think about how we have those conversations with the user as they are searching.”