Interrater reliability in subjective assessments is notoriously poor. This means that when two teachers (lecturers, professors, whatever) mark the same essay, they rarely agree on what grade to give. I have seen grade differences as large as an ‘A’ to a ‘C’. What usually happens in this case is that the two get together and come to an agreement – usually a compromise right in the middle. This is called a subjective piece of work. The assessors use their best judgment to decide a grade for the student.
The other kind of assessment is an objective assessment. An example is a multiple-choice test (or a math problem) where there is only a single correct answer. Either you get the answer right or you don’t. There is no subjectivity in the judgment. The answer is either correct or not correct and the student is either right or wrong. An objective decision with not anything to decide.
With AI, this can all go away.
Using AI to evaluate subjective assessments would remove much of the subjectivity. When you realize that the LLMs (Large Language Models) are trained on just that, large language models, the breadth of training is far greater than any human could realize. AIs will be able to judge, with some measure of accuracy, if a piece is average or not. Having been trained on hundreds of millions of pieces of written work means that a decent AI model should be able to accurately judge whether or not a piece of work is average, above the mean, or below the mean. In addition, there should be a fairly accurate judgment of just how far the work is, either above or below the mean.
I fed a piece of work into two popular AI agents, ChatGPT and Perplexity. Although the scores weren’t exact, they were close enough that two human markers could easily agree and would likely not have to change from the letter grade that they assigned it. More importantly, Ai could also evaluate the work on all of the easily measurable dimensions of writing, including the ACES (Abstract Cognitive Enablers). In asking the agents to evaluate a couple of dimensions, they not only provided a score for the dimension, but they also pointed out the strengths of the work as well as suggestions as to what needed to be done to improve on that ACE.
In addition, I asked the agents to rewrite the work with suggestions on how to improve the ACE that was under evaluation, contained in brackets [], embedded in the text where the improvement should go.
The prompt for the evaluation was as follows:
This evaluation should proceed with a score of 100 being the mean with 15 for the SD. This writing should be compared with a polished magazine style of writing being at 145 and a 10-year-old’s writing being at 70.
The evaluation needs to include:
1). A 250-word general feedback for the student on the overall quality of the work along with a score as stated above.
2). A 100-word evaluation of the logical structure apparent in the article, along with a score as stated above.
3. A rewriting of the article exactly as it has been written, but with suggestions and changes to increase the rational flow of the article in line with square brackets [].
Overall, I am pleased with the outcome and believe that subjectivity can be largely removed from the marking of subjective assessments.
Comments
2 responses to “Objective/Subjective – Not Any More”
I hate that I think this is really cool. I’m curious: how would (if possible?), this work for something like a thesis proposal or a study? Could Ai be adapted to analyze possibly new ideas and imported data directly? I’m unsure if there’s research on that yet!
So I’m not sure I see where the use of AI is a way to make money in this post. I do see ways to reduce costs for university and secondary education in that AI will produce similar enough grading results as humans thereby allowing for few humans to perform grading exercises (Codiste, 2025). However, this is a cost savings to the institution. If the theory is that a grading service is offered to institutions I could see how we could generate revenue.
Moreover I am somewhat concerned that subjectivity is a delicate balance. While AI would apply a consistent rationale for grading a subjective paper, are we potentially missing out on points of view that may cause us to consider alternatives that bear fruitful outcomes?
.
Codiste, (2025); https://www.codiste.com/ai-in-grading