Machine learning is the easy part. Doing the right thing is hard.
Computers understand human language to a large extent through relationships between words. These relationships are captured using simple vector algebra.
For example, “man : king :: woman : queen” (man is to king as woman is to queen), or “sister : woman :: brother : man”.
The most commonly used dataset for these relationships is called word2vec, and it derives from a collection of three million words from Google News articles. Word2vec powers everything from language translation to contextual web searches and human speech recognition, and much more. Word2vec is used throughout the tech industry.
However, evaluate “father : doctor :: mother : x” and it will yield x = nurse. That is to say, father is to doctor as mother is to nurse. And “man : computer programmer :: woman : x” results in x = homemaker, which means man is to computer programmer as woman is to homemaker.
Clearly, these word relationships can have gender biases that are hard to really justify. Being of a certain gender has no inherent link to being a homemaker. This happens because any bias in the input texts (from news articles in this case) also gets propagated through the system.
So if “computer programmer” is more closely associated with men than women in the articles, then a search for “computer programmer CVs” would potentially rank men more highly than women.
This is unfortunately a reflection of the subconscious social biases that creep through in our language everywhere.
Going back to the vector algebra, sexism can be thought of as a warp over this word vector space. So fixing the bias in the word relationships is then a question of applying an opposite warp while still preserving the overall spatial structure, according to a team of researchers from Boston University and Microsoft Research. (Paper: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings)
They do this by searching for word pairs similar to “she: he” to generate a list of available gender analogies (eg. midwife: doctor; sewing: carpentry; registered_nurse: physician; hairdresser: barber; nanny: chauffeur). Then they use Amazon’s Mechanical Turk to have people manually classify these as appropriate or inappropriate.
With this information then, the research team understood how the sexism was mathematically exhibited in the vector space and how it could be transformed to remove the warping. This transformed vector space was then used to get a better list of gender analogies with significantly less gender bias (eg. maid: housekeeper; gals: dudes; hen: cock, daughter: son).
Reducing the bias in today’s computer systems and technology is one step towards reducing gender bias in our society. There is nothing scarier than people who will be building the artificial intelligence of tomorrow but aren't quite familiar with the several socioeconomic factors in play that affect everyday people's lives. Everything from hiring and policing to risk analysis is using more and more machine intelligence every day.
At the very least, machine learning should not be inadvertently amplifying our biases as we move towards a more automated, artificially intelligent world.