Eliminating bias from AI datasets: The imperative and how Quadrant helps

Picture of Saif Kakakhel

In the modern world, Artificial Intelligence (AI) is being leveraged across various industries to tackle issues as diverse as inventory management in retail and route optimization in navigation. Due to its immense potential, AI is increasingly being used in pertinent areas such as finance, marketing, and human resources – which raises the question: will the use of AI in these (and other fields) remedy or amplify problems that lend themselves to flawed decision-making? This article will delve into the matter of ‘fairness’ in AI systems, elaborate on real-world instances of AI-based discrimination, discuss existing approaches towards mitigating AI bias, and more.  

AI Bias: What it is and why it’s problematic  


In our rapidly evolving technological landscape, AI emerges as a monumental force. Yet, its promise dims if it doesn't serve every segment of society—regardless of gender, race, or socioeconomic status. Regrettably, biases embedded in AI can amplify existing societal disparities, favoring some while sidelining others. Our challenge is twofold: harnessing AI's immense potential while ensuring its designs champion fairness and inclusivity. 

Many experts believe that AI can and should be used to mitigate human bias in key decision-making processes, while others believe AI will only serve to cement and amplify societal biases as the technology scales with time. Here’s why both claims hold merit and warrant investigation.   

What is AI bias, and why does it matter?  

Defining AI bias itself is not straightforward – as bias and fairness are social constructs. In the context of AI, ‘bias’ does not refer simply to a preference, but rather a systematic discrimination by a model/algorithm against individuals/groups based on certain characteristics. For example: race, ethnicity, sexual orientation, gender, and so forth. This list is not exhaustive, as there is a much wider gamut of sensitive attributes that can affect the decision-making of an AI when it comes to people.   

When an AI model is biased against a certain demographic(s), it can perpetuate unjust outcomes that have serious ramifications for society at large: 

  • Erosion of Trust: Biased AI can erode trust among its users. For instance, in sectors like healthcare or finance, any perceived unfairness can have serious consequences since people’s access to affordable medical treatment and credit can be impacted. 
  • Economic Impact: Wrong AI-driven recommendations or predictions can result in significant financial losses for businesses. 
  • Deepening Societal Divides: Instead of bridging societal gaps, an AI model built on biased data can further entrench stereotypes and widen societal divides. 

How can AI help mitigate human bias in decision-making? 

AI has the potential to decrease bias in decision-making because machine learning algorithms are trained to only consider the variables that improve their predictive accuracy based on the training data. In other words, it limits instances in which a human’s subjective interpretation of data can impact decision-making (for example, not considering a candidate based on poor credit history, refusing to rent out an apartment due to race or ethnicity, and so forth). In fact, research has shown that automated financial underwriting systems are especially beneficial for historically disadvantaged applicants.  However, even though AI can create more equitable and fair outcomes, if the system itself is biased because of skewed or unrepresentative training data, things can go very wrong.  

Real-world examples of AI bias 

In 2014, software engineers at Amazon were working on a recruiting tool that would allow the hiring team to filter many applications and highlight those that stood out. In 2015, they discontinued the use of the tool after realizing it discriminated against female candidates for technical roles. Similarly, in a 2016 ProPublica paper titled ‘Machine Bias’, Julia Angwin demonstrated how COMPAS – an AI tool responsible for predicting recidivism – was twice as likely to mislabel African American defendants as ‘high-risk’ compared to white defendants. Both these examples illustrate the unfair and unjust outcomes that AI can replicate based on existing patterns of human behavior in the real world. 


How does bias arise in AI models?  

Although biases in AI are typically considered algorithmic in nature, it is most often the underlying data that is the crux of the issue. Models may be trained or fine-tuned on data containing human decisions – which is why they can embed human biases at scale. Additionally, when data sampling is unrepresentative of the populations at large, patterns of bias get baked into data distributions. As a result, all stages of the AI’s lifecycle, from problem formulation and model building to system deployment and monitoring mechanisms, are tainted by bias. 


How to counteract biases in AI? 

There are various sources of bias in AI, including – but not limited to – sampling bias, algorithmic bias, confirmation bias, and interaction bias. Two of these sources can be traced to AI models across many sectors and applications: sampling and algorithmic bias. Algorithmic bias is highly technical and is outside the scope of our investigation, whereas sampling bias is ubiquitous and can be mitigated with good training and data management.  

Sampling bias is a statistical problem in which random data selected from a population does not reflect the distribution of the population. The sample data might be skewed towards a certain subset of the group. To remedy this issue, it is important to enhance data collection through inclusive/representative sampling techniques and use humans to audit both the data and the models. Only if data sampling/sourcing is truly representative of the underlying population can an AI model be trusted to derive insights for wider audiences and populations.  


Other promising approaches for mitigating AI bias 

Ensure inclusivity by design and predict the impact of the technology: In this context, inclusivity refers to incorporating diverse perspectives in the design process itself. Said diversity can come in the form of gender, race, ethnicity, class, and culture. Foreseeability is about predicting the impact that the AI product will have post-deployment and over time. 


Perform extensive user testing: Get representatives from the diverse groups that will use your product to test it before it is released for broader audiences. 


Anticipate problems before deployment: Hire an AI ethicist for your development team so that fairness and non-discrimination risks can be proactively identified and resolved – which will ensure you do not release a biased system and save you considerable resources and time. 


Investigating underlying decision-making processes: When models trained that are trained on human decision-making show bias, the underlying human behavior should be probed to improve processes in the future.  


Human oversight: Adopt a ‘humans in-the-loop' approach whereby a model presents its recommendations to human evaluators/decision makers for approval. In other words, a mechanism through which humans and machines work together to decrease bias. Firms can either utilize internal ‘red teams’ or third-party audits for this purpose. In this method, it is important to make a model’s decision-making process transparent so that evaluators know how much weight to allocate to different recommendations.  


Mitigating AI Bias with Geolancer 

In the quest to reduce AI bias, the diversity and inclusivity of training data play a pivotal role. Geolancer, Quadrant's innovative crowdsourcing platform, is engineered for this precise need. It’s not just a data collection tool but a customizable solution that can be tailored to gather a range of data types - from Points of Interest (POI) and audio to video, LiDAR, and 3D images. 

The global reach of Geolancer's crowdsourcing model ensures a rich tapestry of data, encapsulating varied demographics and regions. Each piece of collected data is a step towards a more unbiased, accurate, and fair AI model. Businesses leveraging Geolancer aren’t just accessing data; they are tapping into a world of insights, each reflecting the rich diversity of the global population. In this way, Quadrant’s Geolancer is not just a platform but a pathway to AI that truly understands and serves all. 


Towards a more equitable AI future 


In an age where AI influences nearly every facet of our lives, ensuring its fairness is not just a technical challenge but a societal imperative. Bias in AI, whether rooted in skewed data or reflective of broader societal inequities, risks perpetuating and amplifying these very disparities on a global scale. However, with proactive tools and platforms like Quadrant's Geolancer, we have the means to pivot towards a more inclusive and unbiased AI landscape. By placing a premium on diverse data sourcing and continuous innovation, we not only enhance the technical prowess of AI but also champion its potential as a tool for equity and progress. As we stride into the future, the commitment to fairness in AI will define not just the efficacy of our technologies but the kind of digital world we wish to inhabit. 


Interested in sourcing high-quality and diverse training data for your AI applications? Fill out the form below to speak with one of our savvy data consultants.




Great updates

Subscribe to our email newsletter today!