Welcome back to our ongoing series on AI marking. We’ve touched on the innovative strides this technology has made, especially when it comes to AI-assisted essay marking. Beyond its remarkable benefits, such as the potential to reduce grading costs by approximately 60%, it’s crucial to also address the challenges including the types of AI bias.
In the previous blog post, we examined 10 common AI myths and discovered that AI marking has the potential to reduce bias in grading. However, we must guard against the AI inheriting biases present in its training data. Bias in grading can function like invisible ink, often unnoticeable under regular light but glaringly obvious under scrutiny. Although we often picture AI as a neutral, objective tool, it’s essential to understand that AI is a mirror that reflects its creators’ attributes – including their biases.
As we continue to integrate AI more deeply into our daily lives, the issue of AI bias in education, specifically its impact on grading and assessments, must be thoroughly examined and addressed. For AI to realise its true potential of enhancing our world and democratising opportunities, we must ensure that it is not just intelligent but also fair, unbiased, and genuinely representative of the diverse world it serves. So which types of AI bias should we watch out for?
Types Of AI Bias
Artificial intelligence and machine learning algorithms are only smart as the data is fed. They’re designed to recognise patterns and learn from them to make future decisions. If the input data reflects human biases, AI can perpetuate them.
Data bias occurs when the training data fed to an AI model is not representative of the environment the model will operate. It’s like teaching a child only about apples and expecting them to understand all fruits. For instance, if an AI is trained primarily on sample essays from urban test-takers, it may inadvertently favour language nuances, examples, or perspectives that are typical of urban environments. Thus, rural test-takers, or those discussing topics less familiar to urban experiences, might find their work undervalued or misinterpreted.
Data bias can happen in several ways:
This occurs if the sample of assignments used for training is different from all the possible types of assignments the AI system will grade. For instance, if the training data consists primarily of science assignments, the system might not grade history or literature assignments as accurately.
Underrepresentation or overrepresentation
If certain assignments or grading styles are underrepresented or overrepresented in the training data, the AI system might not generalise its grading accurately. For instance, if the system is trained mainly on assignments graded by strict teachers, it might grade more harshly than is fair.
If the training data is outdated, the AI system might not adapt to new trends or educational shifts. For example, if the AI is trained on older English literature assignments, it might not correctly grade assignments that analyse contemporary literature.
Like humans, AI can also suffer from confirmation bias if its training data leans towards a particular outcome. If an AI grading system consistently feeds essays that favour a specific point of view and grades them highly, the system might learn to favour that perspective, devaluing contrasting viewpoints.
In the context of AI marking, an AI system might show confirmation bias if it tends to grade assignments in a way that confirms the tendencies in the training data. For instance, if the AI system was trained on data where teachers gave higher grades to assignments with longer word counts, the system might continue to favour longer assignments, even when they’re not necessarily of higher quality.
Bias in Interpretation
Algorithmic interpretation bias
This occurs if the AI system consistently interprets ambiguous data in a way that reflects the biases in the training data. For instance, if the AI system was trained on data where teachers tended to grade non-native English speakers lower, the system might continue this pattern, even when the quality of work is comparable.
User interpretation bias
This happens when users interpret the grades given by the AI system based on their own biases. For instance, a teacher might trust the AI system’s grading more when it matches their biases and discount it when it doesn’t.
Strategies for Mitigating AI Bias in Grading
As we’ve explored, AI marking offers transformative benefits, from significant cost reductions to streamlined workflows. However, the issue of various types of AI bias remains a critical concern that can undermine the technology’s potential for fair and accurate assessment.
For testing organisations and awarding bodies, particularly those considering the implementation of AI marking, it’s crucial to proactively address this challenge. Whether you’re selecting a third-party provider or using your own marking data to train the model, a comprehensive, multi-faceted approach is essential for mitigating the various types of AI bias. In the following section, we outline a unified guide to help you navigate this complex yet crucial aspect of AI-assisted marking.
Select a Credible Provider: Opt for a provider with a proven track record in ethical AI practices. Scrutinise their transparency reports and ask for case studies that demonstrate how they’ve effectively managed bias in the past. This will serve as a foundational layer if you’re using their base model.
Curate Balanced Data Sets: When using your own training data, ensure it’s comprehensive. Include assignments from diverse subjects, various grading styles, and a wide range of demographic groups. This diversity helps the model learn to evaluate assignments impartially.
Demand Comprehensive Validation: Whether you’re starting with a provider’s model or using your own data, validation is crucial. Use fairness metrics and cross-validation techniques to assess the model’s performance. Ensure that it doesn’t favour any particular group and that its evaluations are consistent across different demographics.
Review and Update Training Data: Periodically revisit your training data. Look for any emerging patterns of bias and make necessary adjustments. Keep the data updated to reflect current educational standards and the evolving diversity of your candidate pool.
Request Regular Audits: Make it a contractual or internal policy to conduct audits at regular intervals. These audits should scrutinise the model’s performance and flag any biases that may have crept in. If you’re working with a provider, ensure they commit to this level of scrutiny as well.
Educate Assessors: Your human assessors need to understand the AI model’s limitations. Provide training sessions that equip them to critically interpret AI-generated results. This will enable them to make adjustments where the model may fall short, ensuring a more balanced evaluation process.
Engage Regulatory Bodies: Keep abreast of any guidelines or standards set by educational and technological authorities. This ensures that your AI marking system remains compliant, whether you’re using an external model or developing your own.
Inform Candidates: Transparency is key. Make sure candidates know that an AI system is part of the evaluation process. Explain its role, its limitations, and how it’s used in conjunction with human judgement to arrive at a final grade.
By diligently following these steps, testing organisations and awarding bodies can mitigate major types of AI bias and contribute to a more equitable and reliable AI-assisted marking system.
The road to minimise AI bias in education might be complex, but it’s achievable with concerted effort. By implementing these strategies and working together, we can make strides towards fairer and more objective AI grading, taking us one step closer to an education system where every candidate gets the fair chance they deserve.
Stay tuned for our next blog, where we’ll explore how we can overcome the challenge of transparency in AI marking.