Multiple Choice Questions are a part of teaching in many, many contexts. Although not an ideal method of evaluating students in every situation, they can be effective in testing coverage in a broad subject domain.
Starting from defining what an MCQ test is, I am going to provide guidelines for writing multiple choice tests.
This short overview is the work of John Carneson, Georges Delpierre, and Ken Masters. Years ago, it used to be on the Cape Town University’s website and when it was removed, they published in its own right and put up on ResearchGate. Ken has given me permission to publish a shortened version of their work here. I have used this shortened version for years in staff training exercises.
I will also publish another article about the scoring of MCQs and how (and why) to do simple item analysis on the questions.
MCQs are short for Multiple Choice Questions. MCQs are traditional “choose one from a list” type questions and usually test recognition memory. The student answers the question by selecting the correct response from among the alternatives provided. There is one correct alternative, and the incorrect alternatives are called “distracters”. When more than one alternative is correct, the question is called a “Multiple Response Question”. That type of question is beyond the scope of this article, and I will not be considering them further.
MCQs are made up of three components: stem, key, and distractors
Here are some general considerations to keep in mind when writing MCQs
· Test for significant learning outcomes
· Don’t test for subject matter trivia
· Test for the intended intellectual skills
· Use correct grammar
· Be sensitive to cultural-gender issues
· Avoid using interrelated items
Writing MCQ Stems
When writing an MCQ stem, present a single, definitive statement to be completed or answered by one of several given choices. The statement may refer to a preceding passage or illustration. Make sure that your stem avoids unnecessary and irrelevant material. The next two MCQs are an example:
Paul Muldoon, an Irish post-modern poet who uses experimental and playful language, uses which poetic genre in “Why Brownlee left”?
c. narrative poem
d. dramatic monologue
A better stem that asks the question directly would be:
Paul Muldoon, uses which poetic genre in “Why Brownlee left”?
c. narrative poem
d. dramatic monologue
When writing an MCQ stem, Use clear, straightforward language in the stem of the question. Questions that are constructed using complex wording may become a test of reading comprehension rather than an assessment of whether the student knows the subject matter. The following two MCQs demonstrate the point:
As the level of fertility approaches its nadir, what is the most likely ramification for the citizenry of a developing nation?
a. a decrease in the labour force participation rate for women
b. a dispersing effect on population concentration
c. a downward trend in the youth dependency ratio
d. a broader base in the population pyramid
e. an increased infant mortality rate
A major decline in fertility in a developing nation is likely to produce a(n)
a. decrease in the labour force participation rate for women
b. dispersing effect on population concentration
c. downward trend in the youth dependency ratio
d. broader base in the population pyramid
e. increased infant mortality rate
Use negatives sparingly. If negatives must be used, capitalise, underscore, embolden or otherwise highlight
Which of the following is NOT a symptom of osteoporosis?
a. decreased bone density
b. frequent bone fractures
c. raised body temperature
d. lower back pain
The following would sample the same content domain without having to resort to a negative:
Which of the following is a symptom of osteoporosis?
a. decreased bone density
b. raised body temperature
c. hair loss
d. painful joints
A common problem with MCQ stems is that information that could be included in the stem is put in each question. Put as much in the question stem as possible rather than duplicating material in each of the options. This saves the students’ time when taking the test:
Theorists of pluralism have asserted which of the following?
a. The maintenance of democracy requires a large middle-class
b. The maintenance of democracy requires autonomous centers of countervailing
c. The maintenance of democracy requires the existence of a multiplicity of religious groups
d. The maintenance of democracy requires a predominantly urban population
Theorists of pluralism have asserted that the maintenance of democracy requires
a. a large middle-class
b. autonomous centres of countervailing power
c. the existence of a multiplicity of religious groups
d. a predominantly urban population
Writing MCQ Distracters
Writing MCQ distracters is much more difficult than writing the MCQ key. Avoid ambiguity. For a single response MCQ, ensure that there is only one correct response. Unless you carefully construct your MCQ test, writing several responses that could be correct, and expecting the student to choose the one that is the most correct will lead to students who will want to argue their point. Usually, this type of test construction is the result of poor design: it is difficult to write plausible alternatives
Use only plausible and attractive distracters and avoid giving clues to the correct answer:
A fertile area in the desert in which the water table reaches the ground surface is called an:
c. water hole
A fertile area in the desert in which the water table reaches the ground surface is called a(n)
c. water hole
If possible, avoid “all of the above” or “none of the above” as alternatives. If you include them, make sure that they appear as correct answers at least some of the time. Distracters based on common student errors or misperceptions can be very effective. Correct statements that do not answer the question are often strong distracters
Avoid using ALWAYS or NEVER in the stem, as test-wise students are likely to know to rule such universal statements out of consideration. Do not create distracters that are so close to the correct answer that they may confuse students who really know the answer to the question. “Distracters should differ from the key in a substantial way, not just in some minor nuance of phrasing or emphasis.” (Issacs, 1994)
Without wanting to engage in an argument over the use of Bloom’s Taxonomy (I know there are reasons not to), I will use the cognitive domains proposed by Bloom to show that writing MCQs can ask questions that are far more sophisticated than the memorization-based questions that are the basis for much of the flack that MCQs get.
Bloom identified six levels of learning within the cognitive domain.
With careful thought, almost all levels of Bloom’s taxonomy can be examined using MCQs. Knowledge is the simplest level to examine, and that is one of the principle criticisms of using MCQs. Synthesis and creativity are almost impossible to measure using MCQs
Knowledge involves remembering – specifics, universals, methods, processes, patterns. This is the easiest level to test with MCQs
Which is the most commonly used measure of central tendency?
Some questions can be much harder than others, but still be tapping the knowledge level, for example.
According to Eysenck and Eysenck (1980), in what order should the following “levels of processing” be arranged to demonstrate from the fewest to the greatest number of words recalled in a memory test.
a. Non-semantic/Non-distinctive -> Semantic/Non-distinctive -> Non-semantic/Distinctive -> Semantic/Distinctive
b. Non-semantic/Non-distinctive -> Non-semantic/Distinctive -> Semantic/Non-distinctive -> Semantic/Distinctive
c. Semantic/Distinctive -> Non-semantic/Distinctive -> Semantic/Non-distinctive ->Non-semantic/Non-distinctive
d. Semantic/Distinctive -> Semantic/Non-distinctive -> Non-semantic/Distinctive -> Non-semantic/Non-distinctive
The ability to grasp the meaning of the material. Common learning objectives include: understand facts and principles, interpret verbal material, classification and description. At this level, the knowledge is assumed, and the testing is for an understanding of the knowledge. In this kind of question jargon or technical terms are used with the assumption that the students know the jargon and can understand when it is appropriately used.
The advantage of a closed response format over an open-ended response format in psychometric testing is that
a. more ‘in-depth’ responses can be elicited.
b. there is a slower response rate.
c. the responses can be easily coded and analyzed.
d. it is more likely that the real attitudes and feelings of the subjects will be revealed.
In psychometric testing, scale attenuation problems might be avoided by
a. making a hard task harder.
b. making an easy task easier.
c. using counterbalancing.
d. using several dependent variables.
This involves the application of knowledge that is both known and understood.
For a population with mean = 100 and standard deviation = 20, the z-score corresponding to X = 110 would be:
Which one of the following memory systems does a piano-tuner primarily use at work?
a. Short-term memory
b. Long-term memory
c. Iconic memory
d. Echoic memory
In order to ask questions which require students to use analysis, the students must use their knowledge, comprehension, and application, and then analyze the question in order to arrive at the correct answer. In the following example, students must be able to recall the different statistical tests (knowledge), understand the basis for choosing each test (comprehension). They then must be able to apply these concepts when information is supplied to them (application), and finally, they must be able to analyze the information in order to answer the question.
An investigator wants to determine whether a daily dose of vitamin C increases intellectual aptitude. Seventy high school students are randomly divided into two groups of 35 each, and designated to receive daily doses of 50 mg of either vitamin C or fake vitamin C (a placebo). After two months of daily doses, IQ scores are obtained. Which statistical test is most appropriate to analyze the data and answer the question?
a. Independent sample t-test
b. Chi-square test
c. Related sample t-test
At this level, students are asked to pass judgment on, for example, the logical consistency of written material, the validity of experimental procedures, or interpretation of data.
Judge the sentence in italics according to the criteria provided below:
“The United Kingdom took part in the first Gulf War against Iraq BECAUSE of the lack of civil liberties imposed on the Kurds by Saddam Hussein’s regime.”
a. The assertion and the reason are both correct, and the reason is valid.
b. The assertion and the reason are both correct, and the reason is invalid.
c. The assertion is correct but the reason is incorrect.
d. The assertion is incorrect, but the reason is correct.
e. Both the assertion and reason are incorrect
Writing good MCQs is difficult. One of the most difficult aspects is to come up with plausible distractors. Several psychometric experts (an exam is a psychometric test examining the knowledge and understanding of a knowledge domain) say that good exam writing technique uses a key and three distractors. I have seen MCQ tests with 10 or even 15 items to choose from. Many teachers argue that five items to choose from are better than four. Whatever number you decide on, when you do appropriate item analyses, something covered in my next article, you need to have the same number of distractors for every question in the exam.
As a rule of thumb, if you are asking straightforward questions, you should plan for the students to be able to answer one question per minute. I have seen instructors write both a 100 item MCQ to be taken in an hour or a 15 item MCQ for an hour-long test. In both cases they had thought they had judged about right and were happy with the short piece of guidance to plan for about a question per minute for straightforward tests.