top of page

Machine Learning Bias (AI Bias)

Beware of Machine Learning (ML) Bias! Learn how to detect and prevent bias in your ML models ⬇️

ML bias, sometimes called algorithm bias or AI bias, is a circumstance that occurs when an algorithm produces results that are systemically prejudiced due to flawed assumptions in the ML process.

Typically, bias can be introduced into ML systems through 1️⃣ faulty, poor, incomplete or prejudicial data sets, which result in inaccurate predictions, and 2️⃣ human error in ML design or in training the ML model that reflect cognitive biases.

So, what specifically are common types of ML bias that can be brought into a ML system?

➡️ Algorithm bias. This occurs when there's a problem within the algorithm that performs the calculations that power the ML computations.

➡️ Sample bias. This happens when there's a problem with the data used to train the ML model. In this type of bias, the data used are not representative enough to teach the system accurately.

➡️ Prejudice bias. In this case, the data used to train the system reflects existing prejudices, stereotypes and/or faulty societal assumptions, thereby introducing those same real-world biases into the machine learning itself.

➡️ Measurement bias. This bias arises due to underlying problems with the accuracy of the data and how it was measured or assessed.

➡️ Exclusion bias. This happens when an important data point is left out of the data being used --something that can happen if the modelers do not recognize the data point as significant.

How to prevent bias:

1️⃣ Establish practice standards, strong data governance and peer review.

2️⃣ Select training data that is appropriately representative and large enough to reflect the population.

3️⃣ Test and validate to ensure the results of ML systems do not reflect bias due to algorithms or the data sets.

4️⃣ Monitor ML systems as they perform their tasks to ensure biases do not sneak in overtime.

It is critical that organizations scrutinize the data being used to train ML models for lack of completeness and potential bias.

Data should be representative of different races, genders, backgrounds and cultures that could be adversely affected.

Most important, data sets and ML algorithms should be transparent and routinely evaluated for quality and integrity.

⚠️ What would ML bias look like in a healthcare ?

A 2020 PNAS study found that gender imbalances in the training data sets of computer-aided diagnoses (CAD) systems led to the CAD system displaying lower accuracy with the underrepresented group. Specifically, when men’s x-rays were predominantly uploaded into the CAD system for training analysis, the accuracy of women’s diagnoses was dramatically lower.

Learn more ➡️

Do you have best practices to share about how to solve for ML bias? 📝 📝


bottom of page