Why we should study about biases in artificial intelligence

Why we should study about biases in artificial intelligence

Tuesday October 31, 2017,

7 min Read

Artificial intelligence

There is a lack of research as to quantify how bias in AI systems are damaging individuals and societies.

You have just mailed your CV to get your dream job. You are anxiously waiting for a response from the company. This could be an opportunity of your lifetime. But wait, there is a little twist. Your destiny hangs in the hands of a machine learning algorithm. The company just decided to hand over (outsource) preliminary filtering of CVs to an artificial intelligence (AI) company because there were too many applications to handle. I am sorry to tell you that a study showed that an identical CV is 50 percent more likely to result in an interview invitation if the candidate’s name is European American than if it is African American. What do you do ?

Bias is an “inclination or prejudice for or against one person or group, especially in a way considered to be unfair.” We heard Elon Musk calling AI our “biggest existential threat.” Although more and more AI experts have spoken up against such pessimism and said AI agents today are far from true intelligence.

“The pace of progress in artificial intelligence (I’m not referring to narrow AI) is incredibly fast. Unless you have direct exposure to groups like Deepmind, you have no idea how fast—it is growing at a pace close to exponential. The risk of something seriously dangerous happening is in the five-year timeframe. Ten years at most,” Musk said.

This debate will go on.

Most urgent matters of concern in AI

In the shadow of this debate, there is a lurking debate that gets ignored much of the time. It is about how AI systems are biased. About how we have transmitted knowingly and unknowingly various forms of biases in the intelligent systems we build.

As years pass by there will be an increased chance that some form of artificially intelligent agent will enter more and more areas of your life. Your Facebook feed is an important internet real estate where your social and political ideas are influenced. Imagine your newsfeed recommendation engine on Facebook (and on other platforms) being plagued by subtle biases that would have crept into the system either through biased datasets or interactions.

Self-driving cars, loan prediction engines, healthcare systems and many other systems are plagued by biases. Machine learning systems learn through data and interactions. Data accumulated by engineers and data scientists through crowdsourcing or even existing open-source datasets contain all forms of biases humans are prone to. Thus it is fair to say that machine learning algorithms are not biased. Humans are biased and we are transmitting this bias in AI systems through the data we produce.

And the real alarming thing to note is that very few people care and are making a real effort to repair these faults. Important stakeholders and companies which develop these systems show little interest in searching for and eliminating biases. To put it in plain and clear terms: if your AI service provider doesn’t clearly mention/inform how they have trained your system or which data has been used, you shouldn’t trust them. Especially when the applications are in critical areas like medical support systems and credit approval. The first step would be just to recognise this as a problem.

A long-term strategy is needed to train data scientists to identify and correct for biases found in the data whenever they are building intelligent systems in critical application areas. As we keep on replacing human thought processes with machine learning algorithms, we tend to rest assured. There is a tendency to trust machine algorithms more than humans and this is obviously a worry.

In the short term, data scientists in the industry should emulate practices of social scientists who have long formed professional habits of questioning data sources and what methods were used to gather and analyse data. Rigorous and detail-oriented data collection methods bring certain amount of context towards a problem being studied. The more nuanced the data used to train algorithms, the more we can expect the machine learning model to be free from bias.

Digital divide and flawed datasets

Many forms of biases come from a lack of equitable access and digital divide. We increasingly rely on big data sources (mostly) coming from affluent societies and people with access to digital devices and the internet. For example, the Twitter sentiment analyser you just built about the election didn’t have any representation from villages where people don’t have access to the internet. We simply assume that data accurately reflects the ground reality but sadly this is far from the truth.

Consider, a popular benchmark for facial recognition is Labeled Faces in the Wild (LFW), a collection of more than 13,000 face photos. This dataset consists of 83 percent of the photos of white people and nearly 78 percent are of men. This is a perfect example of flawed benchmarks and biased datasets at the same time. If your startup, company or your AI vendor is using this dataset to build a face detection system, it is flawed and, more importantly, heavily biased.

There is also a lack of research as to quantify how biases in AI systems are damaging individuals and societies. There are only funny anecdotes about how a voice recognition system didn’t identify one’s voice. But there are no concrete studies to show how different and mostly used AI systems discriminate against sections of the society. Acknowledging that bias exists, itself is a good start.

One of the fundamental questions is how do you define fairness? What makes an algorithm fair? Different scientific studies use different ideas of algorithmic fairness, and although these appear internally consistent, they are also incompatible. Statistical parity is one way to define fairness. Statistical parity is a measure to understand how algorithms behave with a “protected” subset of the population.

Policy In AI

There can be biases that are socially acceptable and biases that are not. There has to be extensive policy research before we come to a conclusion as to which biases are acceptable. Policy intervention is also somewhat necessary, because most of the practitioners demonstrate a lack of understanding of fairness of algorithms and the management doesn’t go into the details of the algorithm. Especially in today’s age it is extremely difficult for anyone to keep up with the pace of development as firms like Google and Facebook have radically decreased the gap between academia and industry.

Policy studies in AI also become important as there exists no mechanism to audit algorithms for “disparate impact”. Disparate impact happens when neutral-sounding rules disproportionately affect a legally-protected group. People often find it difficult to prove and study disparate impact even when algorithms are not involved. In legal matters disparate impacts are only illegal when there clearly exists an alternative way to carry out a said procedure. Because of this it will become difficult to prove machine learning algorithms having a disparate impact. Presently, deep learning algorithms are moving beyond our current abilities to analyse them.

To sum it up, the present and future success of AI generally depends on its ability to:

  1. Understand bias and discrimination problems in AI systems
  2. Detect biases in AI algorithms
  3. Study and put in practice processes to build bias-free AI systems
  4. Study and put in place mechanisms to eliminate biases without taking away the powers of these algorithms.

(Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the views of YourStory.)