Marking Experts

Scholars spend their lifetimes studying a topic become experts, however, do those same scholars becoming experts at marking students’ work?

There is good psychological research on gaining expertise, and if we have a look at some of it, becoming an expert entails more than being assigned the 11:00 a.m. slot on Tuesday mornings for the next 12 weeks to listen to a lecturer go on and on about their favorite subject. We don’t pretend that our students gain expertise in a semester because the acquisition of expertise is a complicated process.

First of all, it takes time. In The Cambridge handbook of Expertise and Expert Performance, Ericsson estimates that it takes about 10,000 hours of doing something to become an expert. It isn’t just the time element that makes you an expert (Ericsson & Lehman, 1996), there are other elements as well.

Just by the first requirement, the number of years that it would take to become an expert teacher is high. If I teach a class for 45 hours/semester (3 hours/week for 15 weeks), I would have to teach 222 classes to reach the 10,000 hours of teaching. Over a 40 year career, that would be 5.5 classes per year. I know that there are lecturers who teach that much, but that means that to become an expert takes 40 years. And that’s becoming an expert in teaching, not an expert in marking.

Every year, as the semester draws to a close, I have a significant amount of marking to do. It feels like it takes at least 10,000 hours every year, but if I look at it objectively (difficult to do at the end of the semester), I really spend around 40 – 60 hours each semester marking my students’ work. That means I need to have 166 – 250 semesters of marking to reach the number of hours required for expertise. I won’t live that long, nor do I think I would want to live that long, given the excitement involved in marking just one more piece of work.

That’s just the time element. The practice element makes me certain that there are very few real experts at marking out there. One of the hallmarks of expertise is the “…seek(ing) out particular kinds of experiences, that is, deliberate practice” (Ericsson, Krampe & Tesch-Romer, 1993). I have never had a colleague ask if they can do some of my marking just for fun, or just because they want to gain more experience. Starkes & Ericsson (1993) note that deliberate practice is one of the primary predictors of the attainment of expertise. I have a number of colleagues who have urged me towards expertise by offering me the opportunity to pick up a bit of practice with their marking loads (although grateful for their thoughtfulness, I have always politely declined), but I have never really met anyone who has sought out opportunities to mark.

We may have expertise in our fields of study, but we certainly are not experts at marking work. Let’s look at a typical essay as an example. In an essay, we expect that the student will have a well-structured argument, use good sources, show evidence of critical thinking, throw in a bit of originality, with a good writing style, use proper spelling, punctuation, and grammar, and get the content correct. That means that we are evaluating (at least) nine dimensions on a single piece of work. We expect the student to produce a multidimensionally superb piece of work that we then take in for marking. When we mark, we, in as little time as possible (given that there are 183 papers sitting on your desk due back next Tuesday), become an expert judge on a hypercomplex problem, providing simultaneous evaluation on multifaceted dimensions, and awarding an appropriate level of credit for the work. And then we wonder about the abysmal lack of reliability between markers.

In addition to making this simultaneous multidimensional judgment, according to the marking rubric that best practice tells us should be published and available to the students in advance of the assignment, we are going to provide a judgment of quality on every dimension outlined in the rubric for every paper we mark. And we are under constant pressure to come up with more detailed and comprehensive marking criteria so that our students know exactly what we are looking for so that they can produce work that meets that criteria.

Using a categorical marking scheme (A+, A, A-, B+, B, B-…) there are about 13 performance categories to use for every dimension on the rubric. Using a point-based system we suddenly have 100 discrete categories. Given that we are marking on a number of different dimensions simultaneously, that means that we need to hold in our head 117 categories (13 different grades across 9 dimensions for marking) if we use a categorical scheme and 900 categories using a point based system.

Cognitively, we can’t do this – we have a very real limit on the number of things we can hold in our short-term memory (5 plus or minus 2 items). We can pretend all we want, but the cognitive limitations are very real and mean that we can’t come anywhere near being able to do what we say we are doing. At best, we become okay at the job, and at worst, we just get through any way we can.

What we are good at is making a subjective judgment about the quality of a paper in a wholistic manner. What we have to keep in mind is that the single biggest influence on your subjective judgment is the paper you just read. I went to a presentation once on this very topic and the presenter (can’t remember his name) had worked out the ideal system for coming up with reasonably accurate grades for students. He said that we should go through the papers and put a + or ++ or a – or — on the paper relative to the paper we just read. When we are finished, we shuffle all the papers, read them again and go through the entire process of putting our +s and -s on the papers, once again, relative to the paper we just read. We then do the entire process a third time, go through and add up the pluses and minuses, put the papers in the final order we have arrived at, and then decide where some arbitrary grade boundaries lie. Sorry – not something I’m going to do with every batch of marking.

If we are never really going to become experts, and the best we are going to be able to accomplish is a broad wholistic judgment, given the cognitive limits on the number of categories we can hold in our heads, the fairest, most reliable system for marking would be to have about five broad categories and assign grades based on those categories – excellent, good, adequate, poor, and fail.

If this is really what we are capable of, why should we pretend to do otherwise? Unless, of course, you really want to become an expert marker – in which case, I have a whole pile of papers you can practice on.