The Future of EdTech: Sojin Lee talks changing her life path, founding a revolutionary AI marking company, and her alliance with Cirrus and CAI

Share via:

It’s not every day you get to pick the brain of an AI trailblazer, but luckily for us, we got an afternoon to spend with Sojin Lee, founder of Blees AI. From the initial challenges of data collection to the exhilarating breakthroughs in AI essay marking, Lee offers a candid look into the journey of Blees AI. This interview sheds light on the intricacies of integrating AI into the complex world of educational assessments, revealing the technological skill, ethical considerations, and collaborative efforts that drive this company.

Join us as we explore the motivations, challenges, and triumphs behind Blees AI, and get a glimpse into the future of education and technology, through the eyes of the founder.

Thanks so much for joining us, Sojin. Let’s start at the beginning: Can you explain what motivated you to venture into the world of AI? 

“Certainly, let me share a bit about my journey and how it led me to AI. I was at a crossroads, looking for a change that would leverage my deep knowledge in examination processes and propel me into a dynamic new industry. You could say I was in the midst of a mid-thirties crisis, eager to leave a secure job in a familiar industry for something more fulfilling. I’ve always had a keen interest in IT and AI, yet I wasn’t sure how to pivot my career path in that direction. It was scary, leaving the security of a stable career without a clear plan for what lay ahead. I wasn’t unhappy in my previous role, but the idea of being tethered to that one job for life felt confining. I wanted to chase my passions, even if it meant leaving behind a well-constructed career. So, I took the leap.

While I travelled post-quitting, I indulged in some self-reflection, journaling my interests, aspirations, and professional expertise I had gotten over a decade. Despite my various interests — travelling, health, and entrepreneurship — I knew transforming any of them into a business would be a steep learning curve. After all, expertise isn’t just about skills; it’s also about having a network and the know-how to make a living.

It was during my travels that an opportunity knocked on my door. An old friend in Korea, who had made a name for herself in the fashion world, suggested I partner with her to take her business international. It seemed perfect — it ticked all the boxes for my interests. But soon, reality hit. I realised I was out of my depth, my expertise did not match what was needed, and I found myself reverting to what I knew best: finance and accounting advice, areas her business wasn’t even ready to delve into.

Reflecting on that experience, I recognised the importance of not wholly abandoning my expertise. In Malcolm Gladwell’s words, it takes roughly 10,000 hours to master something — and I had already put those hours in my field. So, the key wasn’t to start from scratch but to pivot from a foundation of expertise.

“The key wasn’t to start from scratch but to pivot from a foundation of expertise.”

Sojin Lee, Co-Founder & CEO, Blees AI

That’s where the turn of events came full circle. I met my co-founder, an engineer specialising in Natural Language Processing and machine learning at IBM Watson. Our skills meshed well together. With his technical expertise and my industry insights, we established Blees AI, focusing on automating essay grading with AI. It’s ironic how life comes full circle — my first clients were from the network I built in my previous industry.

I share my story as an example, especially for those who’ve built substantial expertise and are yearning for a new chapter. Chances are, the next venture and the first clients are closer than one might think. One just needs to find a complementary partner who shares the vision and can introduce them to the space that intrigues them. That’s the essence of what I did with Blees AI, and it’s a decision I’ve never looked back on.”

What main challenges did you recognise in essay marking before Blees AI stepped in?

“Before Blees AI came along, the process of essay marking was like navigating a minefield of challenges. My time at the examination department of CPA Canada taught me a lot. I experienced firsthand training markers and grading papers myself, and I saw the struggles. Human markers strive for consistency, but we’re only human. Mood swings, fatigue — you name it, can all skew the marking. And the pressure to churn out grades quickly while still being fair and consistent was a stressful balancing act.”

How did these challenges impact the end-users [test-takers and organisations]?

“Let’s think about the organisations that are most affected at the end of the day — exam bodies and students. These organisations are on a mission to be as fair as possible and give valuable feedback to students, but maintaining consistency is tough. Plus, they often have to incur heavy costs for professional markers. Imagine spending up to €10 per script, with hundreds of thousands of papers to mark each year. Some are even spending up to €20 per script! They’re also racing against time to meet reporting deadlines.

For students, the stakes are just as high. It’s not just about getting a grade; it’s about planning their career path and alleviating their anxiety. And for those needing to retake exams, the wait can be so stressful, as can the constraints of testing schedules. They’re hungry for timely and actionable feedback so they can improve and move forward.

Through the implementation of automated essay scoring, costs can be reduced, making exams more financially accessible to a broader candidate group. Additionally, the provision of prompt feedback empowers students to enhance their learning and skill development at a faster pace. It’s a win-win!”

What was the central idea or concept behind Blees AI’s solution to the essay marking challenge?

“Blees AI’s approach to essay marking targeted the need for domain-specific assessment tools. Our specialties lie in disciplines with definitive or better answers—think accounting, finance, law, and medicine—where the examination answers aren’t as subjective as those you might find in language or humanities papers. Our goal was not just to provide accurate grading but to ensure that feedback was timely. This immediacy is crucial; it means students can digest and act on the feedback while the material is still top of mind, therefore accelerating their learning curve and aiding their professional or academic advancement.

“Where a human marker might take several minutes to grade an essay, our system delivers results in mere seconds, all without compromising the quality of the assessment.”

Sojin Lee, Co-Founder & CEO, Blees AI

We’ve seen significant efficiencies thanks to our automated essay scoring system. The cost reduction is compelling—up to 60% savings over traditional human grading. That’s not to mention the speed: where a human marker might take several minutes to grade an essay, our system delivers results in mere seconds, all without compromising the quality of the assessment.”

How did Blees AI differentiate its solution from other potential solutions in the market?

“Blees AI has carved out its niche in the market by focusing on two key areas. First, we’ve made sure that our platform is incredibly user-friendly. It’s designed for the experts and educators who are brilliant in their fields but may not have a technical background. They can work with our system effortlessly, without needing to consult IT specialists or data scientists. This autonomy and ease of use set us apart—it’s a turnkey solution that champions simplicity and ease.

Secondly, our grading approach sets us apart. We’ve zeroed in on domain-specific essays with binary grading parameters—right or wrong answers—as opposed to the more open-ended essay grading that many other companies tackle. This specialisation is significant because it requires a different technological approach. Our AI is fine-tuned for these types of assessments, offering a tailored, precision tool for our client’s specific needs.”

Now let’s talk start-up experience, no small task! What initial challenges did you face while setting up Blees AI, and how did you overcome them?

“In the early days of Blees AI, the biggest challenge for us was gathering the data needed to train our AI. We needed a substantial number of scripts — at least 200 to 300 — to create a robust system. This volume of data was hard to come by. Primarily, only large certification bodies or big test prep companies had such an extensive repository. But then, getting access to these scripts was another uphill battle, especially with organisations like certification bodies. They had stringent security protocols and lengthy approval processes, making data acquisition a complex task.

To navigate this, we leveraged our personal networks, which proved to be a game-changer. We were fortunate to connect with an organisation that provided us with student responses from a previous exam that had been compromised. This break was crucial for us. After conducting a successful pilot test that showcased our AI’s potential, this organisation not only provided us with essential data but also became one of our first clients.”

Can you recount a pivotal moment in the early stages of Blees AI that confirmed you were on the right path?

“There was a defining moment that really solidified our confidence in the path we had taken. It was during our first proof-of-concept pilot. Here, we had to demonstrate that our AI could match the grading standards of human markers. The benchmark we aimed for was an 80% consistency rate with senior markers. This figure wasn’t arbitrary; it mirrored the real-world scenario where junior markers are trained to reach this level of agreement with senior markers before they’re trusted with live marking.

The breakthrough moment was when our AI consistently exceeded this 80% threshold, 83.5% if I remember correctly! It wasn’t just about matching human performance; it was about achieving a level of reliability and consistency that was critical for our client. Once we hit this milestone, our first client signed on, which was a huge vote of confidence. It was a signal that we were definitely onto something substantial, and it inspired us to go deeper into developing our AI capabilities.”

Can you give us some background on the organisation? Why were they interested in getting started with AI essay marking? 

“Our clients are drawn to AI essay marking for several reasons. First and foremost is cost efficiency. Traditional essay marking is costly, often involving certified professionals whose rates can soar from $60 to $100 an hour. That translates to a significant expense per script. Then, there’s the factor of time. Many of our clients operate under tight deadlines and need results fast. AI is a game-changer here, drastically speeding up the grading process and easing the time pressures associated with manual marking.

Another crucial aspect is maintaining high-quality grading and enhancing the student learning experience. Consider one of our clients who handles 350,000 scripts annually with a team of 150 markers nationwide. Achieving uniformity in manual grading at this scale is an immense challenge. This is where AI becomes invaluable, ensuring consistent grading and enabling timely, high-quality feedback to students, aligning perfectly with our clients’ educational goals.”

What are the key steps in implementing Blees AI’s essay marking system?

“So there are, in a nutshell, four steps. First, we need to prepare the data. This is where we start by organising and refining the data. Usually it’s 200 to 300 scripts. This also involves converting student responses into a format that’s readable and processable by our AI.

Then, we need to annotate the data. At this stage, we select and annotate suitable sentences to train the AI. On our platform, this involves the client going through and identifying correct answers that warrant scoring.

Post-annotation, we build the AI model and develop a tailored scoring logic, ranging from numerical scores (mark of 60, 70, 95 and so on) to letter grades (A+, A, B-, and so on).

The final step involves the AI grading student scripts that were not part of its training set, ensuring it can accurately assess a diverse array of responses. After that, the AI can grade at or above human accuracy”

Were there any unanticipated obstacles during the project’s execution, and how were they managed?

“One of the main challenges is data collection. Usually 200 to 300 scripts is sufficient, but sometimes essay questions can be really obscure and more scripts are needed. Sourcing adequate training data was initially tough. So we partnered with the Computer Science Department at Toronto Metropolitan University, exploring ‘few-shot learning’ techniques and ‘augmenting training data’ methods to enhance our AI’s learning process with limited yet diverse datasets to overcome this.

Another issue was maintaining consistency in AI training, especially with variations in data annotation. We’re addressing this by exploring ‘Programmatic Labelling’ methods, which automate annotation using algorithms, followed by human verification for greater consistency.

Our last struggle was customising scoring logic to each client’s grading system while maintaining a user-friendly interface. To address this challenge effectively, we have partnered with Cirrus Assessment, a highly experienced e-assessment provider with 15 years of experience developing accessible and user-friendly e-assessment platforms.”

And the “elephant in the room” when it comes to AI, how did you account for and minimise bias and ensure good ethics with your AI? 

“In addressing bias and upholding ethical standards in our AI, we’ve initiated the following research and development efforts, which are key components of the work packages outlined in the Eurostar grant we obtained from the EU in collaboration with Cirrus. We are currently working on the research and development of tools like Lime (Local Interpretable Model Agnostic) and SHAP (Shapley Additive Explanations) into our system. Lime helps us understand the reasoning behind our AI’s decisions, while Shap provides detailed insights into factors influencing these decisions. These tools enable us to actively monitor and address potential biases, whether from the AI model or the training process, ensuring our AI operates within ethical boundaries.”

Now for the exciting part: results! Can you share any data that illustrates the project’s success?

“Absolutely, I’m excited to share some compelling data that highlights the success of our projects.

The first case study is an Accounting Certification Body. In this project, we worked with a dataset of 380 student responses from non-directed case studies, each between 3 to 5 pages. Our AI system’s prowess was put to the test with 200 papers and further validated using another 200 papers, making it a comprehensive analysis of 780 papers in total. Not only did we focus on grading accuracy, but we also developed personalised commentary feedback for the students.

The results underwent a meticulous audit and review process involving a senior marker, an independent reviewer, and a consultant. The senior marker went through 200 fresh papers that were not part of training, test and validation process. We compared the AI-generated results with those marked by a human faciliator, and discrepancies were resolved through arbitration by a senior marker. Initially, the AI-human agreement rate stood at 62.5%. However, after the senior marker’s arbitrations, the AI demonstrated an accuracy rate of 70.2% over the human facilitator’s results. But here’s the real success: the overall AI accuracy—the percentage of consistent scores with the senior marker —reached an impressive 88.7%. This figure exceeded our client’s expectation of an 80% consistency rate.

Case Study 2 is a Finance Certification Institute in the United States. We engaged with 12 vignettes for this client, each containing 1-2 sub-questions. We trained the AI with 200-400 scripts for each vignette and then tested its performance with 100-200 sets. The client also provided audit sets ranging from 90 to 500 scripts.

Before arbitration, 8 of the 12 vignettes were on the cusp of achieving a 90% accuracy rate, but four vignettes were below the 80% mark. It’s important to note that 3 of these vignettes were initially intended as a test set, which was instrumental in optimising the AI’s performance.

The average level of agreement with human markers was a noteworthy 85.73%. This figure represents the baseline agreement before arbitration, and we anticipate that the AI’s performance will only improve post-arbitration.”

Were there any metrics that surprised you post-implementation?

“What truly took us by surprise was the AI’s ability to not only match human-level marking but, in several instances, to actually surpass human graders. This was a revelation that exceeded our expectations and really highlighted the potency and effectiveness of our AI-driven essay scoring system. Moments like these reaffirm our commitment and passion for this technology.”

What did these results teach you about the industry, the users, or the technology?

“Our project’s outcomes have set off a series of transformative waves across the industry. Imagine a world where AI grading provides timely and relevant feedback – that’s what we’re creating, changing the game for students by enabling quicker progress and deeper understanding. It’s about making learning more impactful. We’re enabling credentialing bodies to offer more exams throughout the year, giving students greater control over their educational and career paths opening new doors for them. Then there’s the potential for lower tuition costs – the efficiencies we’re introducing could lead to reduced fees, making education more accessible and inclusive. We’re at the forefront of an industry transformation, with more organisations looking at AI to enhance grading efficiency and fairness. It’s a shift towards smarter, fairer education systems. The success of our project is also fostering a broader acceptance of AI in education and assessments, influencing how exams are designed for AI compatibility. And as AI grading becomes more commonplace, it’s likely to lead to the development of standardised practices and ethical guidelines, ensuring consistency and reliability across various educational settings. Essentially, we’re sparking transformative changes, influencing industry practices, and improving assessment methods across the educational landscape.”

What’s next for Blees AI?

“Our commitment is strong towards bridging the gap between client expectations and our current offerings. This means a deep dive into research and development and a solid partnership with Cirrus Assessment to bring an outstanding, user-friendly e-assessment platform with AI marking capabilities to life. Plus, we’re gearing up to expand our reach, build new partnerships, and elevate our marketing game. In essence, we’re set on innovating, collaborating, and driving meaningful change in the world of educational assessments.”

“Our commitment is strong towards bridging the gap between client expectations and our current offerings. This means a deep dive into research and development and a solid partnership with Cirrus Assessment to bring an outstanding, user-friendly e-assessment platform with AI marking capabilities to life.”

Can you speak about this partnership with Cirrus Assessment and Chartered Accountants Ireland?

“Absolutely, our partnership with Cirrus Assessment and Chartered Accountants Ireland is a venture we’re genuinely enthusiastic about. It’s a collaboration that brings together the best of both worlds. With Cirrus Assessment, we’re tapping into their expertise in creating user-friendly e-assessment platforms, which is crucial for us. This synergy allows Blees to integrate advanced technologies seamlessly into our offerings.

At the same time, our connection with Chartered Accountants Ireland is invaluable. They provide us with immediate feedback from end-users, which is gold for us. This direct line of communication enables us to make quick enhancements to our product, ensuring it not only meets but exceeds user expectations. It’s all about a cycle of continuous improvement – designing, researching, developing, and then going back to the drawing board informed by real-world feedback.

In essence, this partnership is all about aligning our high-quality product with the dynamic needs of our users. It’s a collaborative effort to ensure that what we create is not just technologically advanced but also perfectly tailored to the requirements of those who use it.”

Thank you so much for your time, Sojin! One last thing while I have you here: do you have any advice for organisations wishing to get started with AI essay marking? And other entrepreneurs looking to revolutionise their industry with AI technology?

“Sure! Here’s some heartfelt advice:

For organisations venturing into AI essay marking, remember that quality data is your cornerstone. Make sure you have a robust, diverse dataset to train your AI model effectively. Pay close attention to ethical considerations; transparency and fairness are crucial to maintain trust and credibility. And start with pilot projects – they’re your litmus test to refine your AI solution and gauge its effectiveness without diving in too deep too soon.

For those aspiring to make a positive impact with AI technology, starting by identifying a genuine need within the industry that AI can address is crucial. Collaboration becomes a powerful ally—bringing together technical and domain experts. Given the dynamic and ever-evolving field of AI, continuous learning and adaptation are key. Ethical considerations and regulatory compliance are just as essential as the technology itself, requiring regular updates in the AI space. Finding a delicate balance between scalability and precision is vital when assessing how AI can contribute to long-term growth.

Always keeping the end user in mind, solutions should aim to simplify, enhance, and bring joy to their lives. Lastly, embracing the innovation cycle—iterating, improving, and remaining open to feedback—is essential.

Share via:
Topics
Picture of Cristina Gilbert
Cristina Gilbert
Copywriter and digital content enthusiast, Cristina is motivated by the fast-paced world of e-assessment and the opportunities online exams give students to thrive.
Would you like to receive Cirrus news directly in your inbox?
More posts in Better Assessments
Better Assessments

Implementation Unravelled: Launching Your Exams

You’ve successfully migrated your content and integrated your new e-assessment platform. Now, it’s time for the exciting part—launching your first exams. With proper planning and support, your launch can be smooth and successful. This guide outlines key steps and best practices to ensure a seamless transition.

Read More »
Better Assessments

New Frontiers: How to Take Your Awarding Organisation Global

Awarding organisations considering international expansion face unique challenges but also significant opportunities for growth. This article outlines key strategies for successfully navigating cultural, regulatory, and geopolitical hurdles, ensuring sustainable growth in a global market.

Read More »
 

Curious about all things e-assessment?

As Cirrus looks to the future, we are excited to bring you the latest news, trends, and useful information about the industry.

 

Subscribe to the monthly Cirrus Examiner to join our ever-growing community of people passionate about the unbridled potential of EdTech.