Skip navigation
College Quarterly
Spring 1996 - Volume 3 Number 3
Improving Evaluation of Student Performance
by Leo Spindel

Many students react with trepidation and apprehension to the experience of writing tests, examinations and assignments. Nevertheless, no matter how maligned grading and evaluations might be, it should come as no surprise that student feedback on course evaluations indicates a need for assessment and for more frequent assessment. In fact, students are often critical of instructors who do not test enough.

Grades are important to students; very few students drop out of courses in which their performance is satisfactory. It is important to capitalize on this. Students are aware that academic performance and achievement are highly valued and, hence, will for the most part do what is necessary to obtain the best possible grade.

It is important to demystify testing and remove its negative connotations. A test should not be seen as a threat because, if it is, students merely cram for the test and quickly forget what they have learned. Instructors can reinforce the value of testing by clearly explaining the grading process, giving encouraging comments for achievement, and providing clear statements on how to improve performance. More importantly, instructors need to emphasize their role as learning facilitators or catalysts who are there to help students do the best they can.

Test Types

The type of testing an instructor ultimately decides upon should match the learning outcomes of the course. Once it is determined what skills and attitudes students must successfully demonstrate upon completion of the course, methods of evaluating these performances need to be addressed. Tests should also take into consideration the course material being taught. For example, using essay questions for a course in computer programming may cause stress; an objective test would seem more fitting. On the other hand, essay questions would seem more appropriate in a course in political science. It is, of course, desirable, to vary testing procedures no matter what the course content. By doing so, instructors can tap into several learning domains including knowledge, analysis, evaluation, etc.

Subjective or Objective Testing

Subjective tests-essay or short answer-may be appropriate when the information being evaluated encourages expression of values, opinions, explanations, or interpretations; in other words, responses requiring students to display thinking and writing skills rather than simple rote memory. Objective tests-multiple-choice or true and false-are best used when measuring factual information such as statistical data, names and dates. There are, however, no absolutes here. It is possible to develop sophisticated multiple-choice questions which in fact measure higher levels of learning. Consider the following.

An instructor in an introductory psychology course administered a 150-item multiple-choice comprehensive final examination. The items had a range of difficulty from easy to hard. Which of the following would be the best procedure for determining the reliability of the test? Choose among:

  1. the coefficient of correlation;
  2. the coefficient of stability;
  3. the reliability reference scale;
  4. split-half reliability.

In this example, students are asked to evaluate the merits of the four different procedures. It would be incumbent upon learners to understand the methods and engage in a process of elimination to determine the suitability of the choices. Knowledge of these procedures alone would not necessarily lead to the appropriate answer. So, here we have a situation wherein a multiple-choice question taps into the learners' evaluative skills and whose answer does not strictly rely on the recall of information.

Determining What Kinds of Tests to Use

In determining what types of tests might be most appropriate, instructors should consider the following. Are students required to:

  • memorize facts?
  • apply principles?
  • think or remember?
  • be creative?
  • organize material?
  • evaluate information?
Other Assessment Strategies

Students will feel more a part of the teaching and learning process if they can play an active role in the evaluation of course material. Here are some suggestions:

  • Let students write an exam questions and answer it as part of the examination;
  • “Give away” the exam (essay type); e.g. four of seven assigned questions will be asked on the exam;
  • Use open book exams (providing questions are not strictly factual and answers cannot be found by turning to the appropriate page in the text);
  • Add bonus questions;
  • Include a group exam component where students can pool their answers (this encourages interaction).
Guidelines for Constructive Subjective Tests

Questions should be precise. For instance, an answer to the questions, “Explain the differences between subjective and objective texts”, is preferred to “Discuss subjective and objective testing.” Questions requiring reasoning or knowledge applications are recommended as opposed to those requiring factual recall. An answer to the question, “What were the practical effects of the decision to make this a mutual rather than a stock company?” is preferred to, “Name three characteristics of a mutual as contrasted to a stock company.”

Instructors are encouraged to indicate clearly how fully a question is to be answered and its grade value in relation to the overall test. For example, the question: “In fifty words or less, describe” might be followed by “(Value: ten marks).” Questions should be checked carefully for clear writing. Do not say, “How would you determine that your car's engine failure is due to a faulty ignition” if you really mean, “What is the recommended way to determine etc.”

Questions requiring long answers should be broken into several smaller ones, so that each can be answered briefly. Before grading papers, instructors should create a test with accurate answers, expressing what an answer must include in order to gain maximum credit. One question on each paper should be graded rather than the entire paper at once. It is also advisable that, after all papers are graded on one question, instructors look back at the first three or four graded. Interestingly, grades end up being changed after comparisons.

Guidelines for Constructing Objective Tests

There are many types of objective tests including multiple-choice and true and false. Multiple-choice tests tend to be more popular and will be discussed here. The language of test questions should be sensitive to the diversity of learners. Questions must be constructed so as to avoid biases in gender, culture and age, to name a few. For example, consider the following multiple-choice question that might be offered to a journalism student (Jacobs and Chase, 1992):

The famous World War II journalist who was killed on Iwo Jima was:

  1. Jim Nabors;
  2. Gomer Pyle;
  3. Ernie Pyle;
  4. Jim Carter.

This question uses humour to relax students and encourage them to see the test in a less stressful context. Research has shown that the inclusion of humourous items in test questions has no significant impact on test performance. Some students welcome the humour because it makes the test easier (knowing that one or more of the options in the questions are definitely not correct). But is this question really humourous?

If students' history in North America dates back a generation or more and they were avid watchers of television sitcoms, they might remember that Jim Nabors played Gomer Pyle in the TV series, Gomer Pyle USMC and earlier in the Andy Griffith Show. If, however, these show predate many students (and they will), if they never favoured sitcoms, or if they were never privy to North American popular culture in the 1960s and early 1970s, there is nothing especially humourous about this question. In fact, many learners would take each option very seriously. The writer's attempt at humour also trivialized the question. Options 'a,' 'b,' 'd' and 'e' might better be replaced with the names of some real World War II journalists.

Time should not be wasted on testing for details the learners need not know. Only the knowledge, skills and attitudes that will make a difference in the effectiveness of the learner should be the objects of evaluation. For example:

World War II broke out in Poland in September of ______ .

  1. 1945;
  2. 1919;
  3. 1939;
  4. 1956;
  5. 1941.

The question asks the student to recall a date in history. Would a correct answer here really contribute to a student's knowledge or learning? Might not a question addressing the broader issue of the significance of the outbreak of World War II in Eastern Europe be more fitting?

Or, consider this question (Hurst, 1988):

The best heating device for applying solder to metal is:

  1. an alcohol blowtorch;
  2. a gasoline blowtorch;
  3. an oxyacetylene blowtorch;
  4. an electric arc welder.

Option “d” above could be improved if the word “torch” appeared making it comparable to the other three in wording. But, as long as the wording does not suggest the answer or disqualify the alternative for obvious reasons, such an alternative may be acceptable.

As much of the item as possible should appear in the “stem” of a multiple-choice question rather than repeating words in each option. For example (Hurst, 1988):

A Poor Item

The proper blend and pressure of acetylene and oxygen:

  1. burns with a soft, breathy sound;
  2. burns with a crackling sound;
  3. burns with a sharp hissing sound;
  4. burns with a harsh, blowing sound.
A Better Item

The proper blend and pressure of acetylene and oxygen burns with a sound that is:

  1. soft and breathy;
  2. crackling;
  3. sharp and hissing;
  4. harsh and blowing.

Another suggestion: when using words such as “not” or “all except” in the stem of the question, these words or negatives should be emphasized by bolding or underlining so that students do not overlook them. Words or phrases in the stem of a question which may give away the correct option should be avoided. The following illustration contains two such tip-offs.

A Poor Item:

Light-gauge metals can best be welded by rotating the torch flame in an:

  1. oscillating motion;
  2. eccentric pattern;
  3. up and down motion;
  4. circular motion.

Options 'a', 'b' and 'c' above are eliminated because they do not fit with the indefinite article 'a' at the end of the stem. To provide for either a consonant or vowel as a choice, the stem should read: 'Light-gauge metals the torch flame in a(n):' In addition, option 'd' is the only alternative whose motion fits the term 'rotating' in the stem.

Finally

In preparing any test, instructors should first identify important points and then try to devise tests on these points. They should not start by looking for ideas that can be tested easily. The value of a test lies in how well it measures the learners' knowledge of the things they need to know to do their jobs effectively upon graduation. Therefore, testing unimportant details or asking trick questions defeats the purpose of the test. Before using any test item, it is advisable to have two or three colleagues look it over to see if they agree that it covers a worthwhile bit of knowledge, that it is easy to read and understand (although not necessarily to answer), and that the indicated correct answer is really correct. This precaution can prevent not only unreliable tests results, but dissatisfaction among learners.

References

Hurst, W. (1988). Instructional Skills Workshop Kit. Castlegar, BC: Selkirk College.

Jacobs, L.C. and C.I. Chase (1992). Developing and Using Memory Tests Effectively. San Francisco, CA: Jossey-Bass Publishers.


Leo Spindel is Manager of the Staff Resource Centre at George Brown College in Toronto, Ontario.