Writing Multiple Choice Items to Require Comprehension

by Russell A. Dewey, PhD

[This page, https://www.psywww.com/selfquiz/aboutq.html], is about writing multiple choice questions that are fair but hard to guess. It might be of interest to my students—for example, to warn them away from the usual guessing strategies discussed below—or to visiting educators. Comments are welcome at psywww@gmail.com.

Multiple choice questions are widely scorned as "multiple guess" questions. Some teachers assume that multiple choice items encourage superficial studying. Perhaps it is true...other things being equal, students do not study as hard or as well for a multiple choice test. Certainly if students are expecting a multiple choice test and they receive an essay test instead, they complain.

However, essay tests have their own set of problems. For example...

Grading of essay tests is influenced by the neatness of handwriting, according to research. Yet if students are asked to do the test at home on a word processor, teachers cannot be sure the student is doing the work. (Using a computer lab so each student can work on a computer station and turn in the essay before leaving is ideal.)
Grades for essay tests are influenced by the length of the essay. If a student rambles on, there is greater likelihood of hitting a few important points. But we do not want to reward verbosity.
Essays test grades depend upon writing skills. We want students to be able to write well, but how much emphasis do we put on English skills, on a test of psychology knowledge?
Essay test focus on a few large issues. They are excellent for promoting integrative and synthetic thinking. Multiple choice questions focus on smaller chunks of knowledge and can achieve thorough coverage.
The grading of essay tests is somewhat subjective. Essay tests cannot be scored with a machine, except in some experimental AI programs that are still rare, so grading them is very labor-intensive. This makes essay tests impractical for auditorium sized classes. A teacher determined to use essay tests in a large class must resort to a few large, "make or break" tests per term, and that puts a lot of stress on students.
Grading many term papers or long essays in one sitting can creat a "pall effect" of diminished sensitivity to individual expression. A person assigned to grade 100 essay tests is very likely to employ cognitive shortcuts after a while, voluntarily or involuntarily, producing rather superficial evaluation in the end.

Despite all this, essay and short answer tests have many virtues. Students need practice formulating arguments, expressing things clearly, and integrating ideas. Nobody would argue that all testing should be multiple choice.

However, for many teachers in many situations, a good objective test is both fairer and more efficient way to assess learning than an essay or short answer test, especially if one embraces the goal of frequent progress checks.

I maintain it is possible to construct multiple choice questions that are not readily answered using mere familiarity with vocabulary words. Well constructed quiz items require a student to comprehend, not just recognize, basic factual material. If such items can be collected into a machine-scorable test, there are major benefits:

Students can be offered frequent quizzes over small portions of the course at a time (which has been shown to promote learning).
Feedback can be provided quickly.
Students can be offered frequent re-tests or makeup tests to reduce anxiety and promote mastery of the material.

The key to writing "good" multiple choice questions is a three part strategy:

Specify objectives so the questions can be specific and focused and the student does not have to guess what to study.
Reduce frustration for creative students by reducing ambiguities (such as "both a & b" type answers).
Defeat the "test-wise" strategies of the student who has not studied and is attempting to guess answers.

Specify objectives or give study questions

In my opinion, students should not be forced to guess what will be on a test ("psych out" the teacher) to decide what to study.

Educational research shows less able students are most affected when a teacher fails to spell out what must be studied to do well on a test. The poorer students tend to guess wrong. The more able students are better at sensing what the teacher wants. Therefore students most in need of help are likely to flounder even more painfully if they must guess what to study.

The obvious solution to this problem is to give students specific study questions, then draw upon that pool of study questions when constructing test items. Sometimes people criticize this as "teaching the test,", but what is the alternative? Surprising students by asking questions they never realized they were expected to answer?

Study questions can encourage a superficial approach if a superficial approach is enough to answer the questions. Well constructed questions, defined here as questions requiring real comprehension of the material, do not reward superficial forms of studying such as skimming.

As for the complaint that giving students study questions results in "teaching the test", the solution is to offer study questions for all the most important ideas in an assignment. Then teaching the test is teaching the course.

Teaching the Test (so to speak) was actually part of the system in Keller Plans, an independent study method popular in the 1970s. Students were given lots of specific objectives or study questions, and they received many opportunities to take quizzes covering that material. The results were very good. In fact, Keller Plans were one of the few educational innovations in the entire 20th Century that produced better results, consistently, than traditional lecture/discussion methods of instruction.

Reduce frustration for the creative student

I recommend avoiding all of these and none of these and both a & b answers. Over the years I noticed that my best students disliked them the most. The more a person knows about a subject matter, the easier it is to make arguments in favor of answers somebody else might regard as wrong.

True/false questions are the worst in this regard. Often the truth value of an isolated statement is quite debatable! It all depends on how it is interpreted, the definition of a key term, or complexities of context.

True/false questions also raise the problem of response bias. Some people are consistently more inclined to answer True or False. When options like both a & b are included in multiple choice items, students are put in the position of making True/False evaluations.

Better to have four or five alternatives with only one correct. The rationale is the same as the rationale for using forced choice procedures in studying sensory perception: they eliminate response bias (a tendency to say Yes or No more frequently).

Defeat the "test-wise" strategies of students who don't study

The whole point of testing is to encourage learning. If students can guess the answers to a test, they will not study for it. To motivate students to study and learn, one must design quiz items that are not easily guessed without good studying. One must also design a test so that answers are not obvious to the student who has merely skimmed the assignment or studied only highlighted words or read only summaries.

To encourage high-quality studying, one must defeat the common rules of thumb students use to guess correct answers. When these are eliminated, learning the material becomes the easiest way to pass a quiz.

Rule of thumb: "If in doubt, guess." To minimize the impact of this strategy: use five alternatives instead of three or four
Rule of thumb: "Pick the longest answer." To defeat this strategy: make the longest answer correct about a fifth of the time (if there are five alternatives for each question)
Rule of thumb: "Pick the 'b' alternative." To defeat this strategy: make sure each answer is used the same number of times.
Rule of thumb: "Do not pick the same alternative more than twice in a row." To defeat this strategy: use randomly generated answer positions. I did this, and sometimes there were runs of 3 or 4 of the "same answer" on a quiz, purely by chance.
Rule of thumb: "Never pick an answer that uses the word 'always' or 'never' in it." To defeat this strategy: make sure "always" and "never" answers are correct about a fifth of the time
Rule of thumb: "If there are two answers that express opposites, pick one or the other and ignore other alternatives." To defeat this strategy: sometimes offer opposites when neither is correct.
Rule of thumb: "Pick the scientific-sounding answer." To defeat this strategy: use scientific sounding jargon in wrong answers
Rule of thumb: "Don't pick an answer which is too simple or obvious." To defeat this strategy: sometimes make the simple answer the correct one.
Rule of thumb: "Pick a word which you remember was related to the topic." To defeat this strategy: when drawing up distracters (wrong answers) use terminology from the same area of the text as the right answer, but in distracters use those words incorrectly so the wrong answers are clearly wrong.

The criterion of success in writing a fair test item is simple. A student examining the item, while the book is open and turned to the relevant page, should agree the item is fair.

A Test Construction Procedure

When teaching auditorium-sized introductory psychology classes based on my textbook (the same one now online) I used the study questions ("quickcheck" questions) that were in the margins of each page, next to the relevant material, in the print version.

These study questions covered virtually every important concept from the chapter. This resulted in about 110 study questions per chapter. Samples are found in theself-quiz section on Psych Web.

The density of study questions must reflect the academic system. During the first decade and a half that I worked at Georgia Southern, our university was on the quarter system, which meant that classes met daily, and academic terms were short (9 or 10 weeks). A typical student took three classes. Back then, I used about 160 study questions per unit, and students handled it well.

Then we switched to the semester system. Now students took five classes per term, and each class met only two or three times per week. I switched to 110 questions per week. The reduction was roughly proportionate to the reduced time spent on each class in the semester system. I came to feel 110 questions was about right for complete coverage of a 50 page textbook chapter.

I based my quiz item pool directly upon the study questions. To draw up a quiz item, I would start with the study question. If I could not come up with a good test item for a study question, I deleted the study question. As recommended above, I used questions with five alternatives, rather than four. That reduced the likelihood of guessing the correct answer.

I avoided "all of these" or "none of these" or "both a & b" type answers for the reasons discussed above (I found that excellent students often came up with creative reasons to pick the wrong answers).

Often I had to resist the temptation to write "both a & b" after generating two or three or four distractors and stalling. It would be so easy! But, on principle, I persisted until I had five different answers, four distractors and one answer that was clearly correct, for each item.

I used quotation marks and scientific sounding jargon in wrong answers, just as often as I did in correct answers. Those were not effective cues to a correct answer.

To generate hard-to-guess distractors, I used each study question as a starting point to generate plausible sounding alternatives. Each distractor was supposed to be clearly wrong...but worded so that it might sound correct to a poorly prepared student.

Each week I did an item analysis on the answers to my quiz items (since I had all the answer data in a spreadsheet) to see which questions were missed by top-level students. If A level students missed an item, that was a red flag. The item was too difficult or ambiguous.

I also looked at the answer data to see what distractor the students were tending to select (if there was a tendency) and why it tempted them. Then I would change the item to remove the ambiguity or temptation.

The theory behind this was that if, for some reason, a question was inscrutable to the top students in the class, there was something wrong with the question. Perhaps the material was not explained very well in the textbook. In any event, the question had to be fixed. I would make a note of the offending item and change it or delete it.

I found that I wrote better questions if I generated quiz items with the book closed. I just worked from the same list of study questions the students had available to them. The alternative, writing quiz items while I looked at the material in the text, made it more likely I would write a picky question that required students to have a photographic memory.

With the book closed, I had to rely on my own memory of the material. I figured if I could not remember something myself, it was not reasonable to ask students to remember it. This meant I have to double-check later to make sure my own memory of the material was correct! But it was worth the trouble because the resulting questions were more reasonable.

The result of this whole procedure should be a list of quiz items that are hard to guess unless the student truly understands the material. My validation for this procedure was informal. It was based on having students visit during office hours to review a quiz if they got a low score and were curious about what they had missed.

I would pull the quiz for them and let them sit down with it using a spare chair in my office. Each quiz item had a page number from the text, making it easy for them to locate the relevant material in the textbook.

In most cases, when students opened the book and compared the quiz item to the material in the text (including the relevant study question), they agreed the quiz item was fair. But sometimes they did not. If they could make a halfway-decent argument that a quiz item was unfair or ambiguous, with the evidence in front of both of us, I would give them the disputed point and agree to change the item. And I did.

Write to Dr. Dewey at psywww@gmail.com.

Back to Psych Web Home Page
...or ....Top of this file

Don't see what you need? Psych Web has over 1,000 pages, so it may be elsewhere on the site. Do a site-specific Google search using the box below.