If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Bias in predictive algorithms

A machine learning algorithm can make a prediction about the future based on the historical data it's been trained on. But when that training data comes from a world full of inequalities, the algorithm may simply be learning how to keep propagating those inequalities.

Criminal justice

In the criminal justice system, a risk assessment score predicts whether someone accused of a crime is likely to commit another crime. A low-risk defendant is deemed unlikely to commit another crime, while a high-risk defendant is deemed very likely to commit another crime. Risk assessments are used at various stages in the system, from assigning bond amounts to determining sentences.
Computer algorithms are increasingly being used to come up with the risk assessment scores, since a computer algorithm is cheaper to employ than a human and can be based on much more data.
Diagram of a risk assessment algorithm which takes an input of information about a defendant and outputs either low, medium, or high risk.
In 2016, the investigative agency ProPublica analyzed the scores from an algorithm used in Florida on 7,000 people over a two year period and checked whether those people actually did commit subsequent crimes. 1
They discovered that the algorithm underestimated the likelihood that white defendants would re-offend but overestimated the likelihood for Black defendants:
WhiteBlack
Labeled higher risk, but didn't re-offend23.5%44.9%
Labeled lower risk, yet did re-offend47.7%28.0%
The risk assessment algorithm wasn't trained on data that included the race of defendants, yet it learned to have a racial bias. How?
The code for that particular algorithm can't be audited directly, since it is a closely guarded company secret, like many machine learning algorithms. However, Stanford researchers reverse-engineered the results and came up with similar predictions based on two primary factors: the age of the defendant and the number of previously committed crimes. 2
In America, Black people have historically been arrested at a higher rate than white people, due to factors like increased policing in urban areas. For example, the ACLU found that Black people were 3.7 times more likely to be arrested for marijuana possession than white people in 2010, even though their rate of marijuana usage was comparable. 3
A chart of the following data on marijuana possession arrest rates per 100,000 people:
StateBlack arrestsWhite arrests
Iowa1454174
D.C.1489185
Minnesota835107
Illinois1526202
Wisconsin1285215
Kentucky697117
Pennsylvania606117
A machine learning algorithm that's trained on current arrest data learns to be biased against defendants based on their past crimes, since it doesn't have a way to realize which of those past arrests resulted from biased systems and humans.
🤔 The researchers from Stanford discovered that humans have the same bias when making risk assessments. Which is worse, a biased human or a biased computer algorithm? What actions could reduce that bias?

Hiring decisions

Big companies receive hundreds of applications for each job role. Each application must be screened to decide if the applicant should be interviewed. Traditionally, screening is done by recruiters in the HR department, but it's a tedious task and risks subjecting applicants to the biases of the human recruiter.
Many companies are starting to automate screening with algorithms powered by machine learning, with the hope of increasing the efficiency and objectivity of the process.
A screening algorithm reviews an applicant's résumé and assigns a score that predicts the applicant's fit for the job role.
A diagram of the screening algorithm process. A resume is inputted into an algorithm (represented as a black box) and there are three possible outputs: "Great fit", "Good fit", and "Not a fit".
In 2014, Amazon experimented with using software to screen job applicants.4 However, they discovered that the software preferred male candidates over female candidates, penalizing résumés that contained the word "women's" (as in "women's chess club") and downgrading graduates from all-women colleges. How did software become sexist?
The screening software was trained on a decade of résumés that had been previously rated by employees as part of the hiring process.
In 2014, Amazon employees were largely male:
A bar chart of the following data on gender breakdown in job roles at Amazon:
Job role% Female% Male
Senior officials1882
Mid-level officials and managers2179
Professionals2575
Technicians1387
Laborers4555
Chart source: Seattle Times
Even if the male employees weren't intentionally sexist, they were rating the résumés based on their own personal experience. Plus, many résumés come from referrals, and males have generally worked with other males. That results in a training data set that has relatively little representation of female résumés and biased scoring of the résumés it does have.
Another source of potential bias are the libraries used for natural language processing. Text parsing algorithms often utilize a library of word vectors that rank the similarity of words to other words based on how often they typically co-occur in digitized texts. A 2018 study found bias in one of the most popular word vector libraries, revealing that terms related to science and math were more closely associated with males while terms related to the arts were more closely associated with females. 5
A scatter plot that shows the association of subject discipline terms with gender. It shows that the arts are more associated with females and science is more associated with males.
Chart source: ArXiv.org
That same study found more positive sentiment associated with European-American names than African-American names:
A scatter plot showing the association of European-American and African-American names with sentiment. African-American names are more correlated with negative sentiment while European-American names are more correlated with positive sentiment.
Chart source: ArXiv.org
Amazon's attempt at automatically screening applicants failed, but some companies are still attempting to create automated solutions for hiring that are free from human bias.
Pymetrics is one such company that offers a screening service powered by machine learning. However, since it is so difficult to evaluate a candidate based only on their résumé, their process incorporates a behavioral assessment. In addition, whenever they tweak their algorithm, they test it on thousands of past applicants and check for discrimination.6 They've turned that audit process into open-source software for other companies to use, too. 7
It is nearly impossible to know whether a screening algorithm is rejecting candidates that would have been a great fit for a job, since a rejected candidate never gets a chance to actually work in that role. That's why it's doubly important for screening algorithms to be thoroughly audited.
🤔 Would you rather have a human or an algorithm screen you for a job? If you knew that an algorithm was reviewing your résumé, what would you change?

🙋🏽🙋🏻‍♀️🙋🏿‍♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!

Want to join the conversation?