College Quarterly
Spring 1997 - Volume 4 Number 3

Reducing Bias In Evaluation, Part II: Strategies to Overcome Bias.

by Cathy Coulthard

As the instrument we use to measure students' acquisition of knowledge, skills and attitudes expected from curriculum outcomes, evaluation is an essential component in both the curriculum process and an integral part of good teaching. The first part of this discussion of evaluation bias, The Sources Of Bias, (in the Spring 1997 CQ) identified possible barriers to fairness commonly found in evaluative methods utilized by college faculty. But understanding the likely sources of potential inequity and unreliability is a precondition for its recognition, and not in itself a strategy for remediation. This second segment focuses on how evaluation bias might be overcome.

We know that reliable evaluation can be limited both by instructors' ability to objectively evaluate student learning, and by the conscious or unwitting introduction of evaluation bias. While many college instructors have exclusive responsibility for evaluating their students' performance, the research literature suggests that most teachers find it very difficult to independently develop equitable evaluation strategies (Neff, 1989), and shows that it is unreasonable to assume that classroom teachers can routinely evaluate their students with any demonstrable consistency (Nottingham, 1988; Hughey and Harper, 1983).

We also know that despite the maxim that the formal training of all community college instructors should include "at least one course in student evaluation methods," most instructors do not receive training in test construction and other forms of evaluation (Neff, 1989: 18), that design of student projects and tests through which student achievement is assessed is largely a matter of individual instructor choice, and that few college instructors have had specific training in the design of methodologically reliable and valid evaluation strategies (Chase and Wakefield, 1984).

It is this author's view that (a) formal examination of instructors' evaluation strategies should be given careful attention in an annual and accumulative professional evaluation which should be contractually mandated for all college teachers, and (b) as part of individual professional development plans and the development of college resources to improve teaching, all instructors should be asked to identify the (informal or formal) means by which they will continue their education in appropriate methods of student evaluation.

That viewpoint assumes that college teachers, administrators, and students all share a community of interest in supporting processes which promise better assessment of student evaluation. There seems little doubt that in most community colleges, some specific initiative is needed to promote more reliable and equitable evaluation. In current assessments for professional promotion, retention, or tenure, college instructors' skills in test construction and other evaluation methods frequently receive little attention. Designated colleagues or administrators usually observe only a formal classroom situation, and routinely avoid classroom visits on those occasions in which a test or other evaluation exercise is planned, because that would be a "waste of time" (Neff, 1989).

That practice does not square with the notion that only through carefully examining evaluation procedures can teachers determine whether the tests used are fair and reliable measures of what students have learned (Neff, 1989). More importantly, it fails to recognize student evaluation as an integral part of both teaching and curriculum, and provides no systematic mechanism for redressing evaluation bias.. To demonstrate achievement of curriculum outcomes and that students have achieved what is expected of them requires confidence in methods of student evaluation, and that seems best accomplished by a periodic professional examination of how the instructors' evaluate their students' performance.

Of course, such developments are not without some foreseeable difficulties. To include an examination of evaluation methods in a professional appraisal might be seen as an incursion on academic freedom to manage instruction as instructors' choose (Chase, 1984). But it is arguable that such appraisals are the single most reliable means of ensuring professional accountability. Professional self-governance implies not only common principles and standards, but that some means should exist to regularly assess their application. With appropriate safeguards (e.g., reliance on peer review) in place, it hardly seems threatening to periodically enquire as to whether the methods chosen to measure what students have learned are useful and reliable indicators of intended curriculum outcomes.

It is also arguable that since student appraisals of college teaching are now so often used to help determine promotion, tenure and merit-pay, faculty will benefit by a periodic and formal review of evaluation methods which would require students' surveys to be equally clear, systematic, and methodologically reliable. Students will also benefit, because they could have more confidence in the process by which they are evaluated. That is because an appraisal process which requires weighing evaluation methods against desired course learning oucomes limits an instructor's ability to use grades as weapons, helps provide clear appellate standards, and impedes the unfortunate practice of defensively assigning uniformly high grades regardless of student performance, in efforts to increase student demand for particular courses or to deliberately skew the results of student opinion surveys (DuCette and Kenney, 1982).

In sum, if the purpose of evaluation is to effectively measure the degree to which a learner has acquired the knowledge, skills and attitudes intended by the goals of curriculum, a formal assessment of evaluation methods is required to hold the instructor responsible for the methods chosen, and that assessment is best accomplished through an annual professional appraisal. But this requirement would be of little help if instructors were not encouraged to continuously refresh their skills in and augment their knowledge of evaluative methods.

A greater awareness of appropriate and effective methods of evaluation is especially important to remedy the practices of instructors who award grades based on student attitudes or behaviours which on consideration of the curriculum, have little or no bearing on learning. For example, the literature recognizes poor teaching practice in the kind of disciplinary action which reduces student grades for chewing gum in class or failing to return library books (Nottingham, 1988). Lack of formal training in evaluation and the poor lessons of imitative experience are important barriers to improving evaluation. Anticipation of a regular appraisal of their teaching which includes testing would encourage teachers to become more familiar with current methods through the literature, workshops or other resources, and provide incentive to review the utility of evaluative methods by systematically examining their explicit relation to the desired learning outcomes of the curriculum. The process of constructing good evaluation instruments begins by asking "What measure(s) are likely to provide the most valid assessment of [that] learning outcome?" (Kemp, Morrison and Ross, 1996).

Of course, not everything taught is the subject of testing (Nottingham, 1988), and teachers must first decide what items and issues should be included in evaluation. If development of appropriate evaluative instruments begins with a review of the learning outcomes sought in the course, evaluation more clearly becomes a purposive exercise in assessing students' knowledge, skills, and behaviour (Kemp et al., 1996). That sequence enables the evaluation method on which an instructor ultimately decides to closely correspond to the stated learning outcomes of the course (Spindel, 1996).

If assessment tools must square with the outcomes defined by curriculum (Chase, 1984), then instructors must identify what students can reasonably be asked to demonstrate upon completion of the course before determining what methods are best to evaluate student performance. Instructors should maintain consistency between their methods of instruction and the evaluation procedures used, i.e., if the chief method of instruction is one of application rather than analysis, then the method of evaluation must also be application. In judging evaluation strategies, it is an important principle that instructors must maintain a consistent connection between methods of instruction and evaluation, and must ensure that both are applicable to the curriculum outcomes sought. Instructors who teach a variety of courses should be aware that they might need to use several different methods of student evaluation, the choice depending on both the nature of the subject (Neff, 1989) and the nature of the students being taught.

The next step in achieving unbiased evaluations is to closely examine the format and content of the methods chosen to uncover hidden biases. The format chosen for design and grading criteria should be examined to ascertain that the evaluative procedure is suitable for the method of instruction, and that both address the desired course outcomes. In deciding on an evaluative format, instructors also should recognize diversity in students' experience of evaluation methods, and should ensure that their students can understand the linkage between how they are being evaluated, and what knowledge or skills are represented by stated learning outcomes (Kemp et al, 1996).

Bias also can be created by the assumption that the same products and standards can be be used for all students in the class (Chase and Wakefield, 1984), and instructors need to consider the specialized needs of individual students. For example, the needs of visually-impaired students can be accommodated by administering tests orally, in large print, or in braille. Hearing-impaired students might need to receive information in the signing formats (e.g., ASL) with which they are familiar, and learning-disabled students might need only extended time to finish their tests (Neff, 1989). Likewise, the use of particular products or techniques might be necessary to avert bias in testing students with physical disabilities, foreign students, or those who are members of minority cultures (Chase and Wakefield, 1984). Instructors can reduce the possibility of evaluation bias through utilizing testing facilities, where available, specifically designed for the purpose of meeting the needs of diverse student populations.

Another method of averting evaluation bias is to carefully review test content to excise formats, language use, and references which might confuse or offend students, the ideal being to present questions in terms which are universally representative and familiar to those students (Wells, 1996) and sensitive to their diverse cultures and varied backgrounds. In practice, that requires care in checking for clear writing, in framing questions in simple terms, in using frequent synonyms and examples to help clarify the questions' intent, in constructing questions in positive sentences (as the negative form is often overlooked), and because students unfamiliar with that structure often leave multi-concept questions incomplete.

Construction of evaluation questions and statements should consciously avoid biases of gender, culture and age through careful neutrality in (non-sexist, non-stereotypical) language (Spindel, 1996), and the use of examples and references should reflect a variety of student interests and commonly-understood themes (Peterson, 1989). In practice, both the design of tests and post hoc assessments of evaluation should be guided by the maxim that students should not be penalized for poor performance on questions which were poorly asked, or made more difficult to answer by the teacher's failure to provide suitable instructions or communicate the question's intent (Nottingham, 1988).

To help control for bias created by teachers' assumptions of learners' prior knowledge, teachers might begin by asking themselves whether the evaluation process tests appropriately for the knowledge expected. Time should not be wasted on testing for details of knowledge which the learners need not know in order to demonstrate satisfactory learning outcomes in the course (Spindel, 1996), although instructors' grading should reward extra efforts evidenced by students who have mastered those additional details. In sum, instructors need to assess their methods of evaluation both for validity as representing desired course outcomes, for practicality in eliciting evidence of learning from their students.

Bias in evaluation also can be countered by providing clear conceptual links among all the components of the curriculum, including learning outcomes, methods of instruction, and methods of evaluation. Instructors might consider providing students with direct practice in the type of evaluation they should expect and coaching students who demonstrate difficulty with the evaluation process. This enriches learning and provides a common classroom experience which helps to offset diversity in educational background. Above all, the linkage of curricular components must be reflected in the information provided to students regarding what is expected of them in preparation for evaluation, and should be communicated by standardized-format course outlines and/or detailed sets of objectives and outcomes (Spindel, 1996; Neff, 1989).

Evaluation methods should be congruent with teaching methods and both should address desired learning outcomes (see Kemp et al. 1996) in ways which are transparent to students. Sample preparatory examinations, classroom discussion of problems, public recognition of good work, and recitation classes in which detailes expectations can be reviewed all assist students and can alleviate anxiety; students who are more familiar with evaluative methods are more likely to produce test performance which is a valid measurement of intended learning (Neff, 1989). In order to ensure consistency between method of evaluation and learning outcomes, teachers should advise students in quite explicit terms as to what variables will be used to make judgments in grading. Those variables should be clearly stated in terms which are understandable and defensible in open discussion with students (Nottingham, 1988).

A last point is that instructors should carefuly examine their methods of evaluation to consider whether they are consistent. Spindel (1996) found that after grading all student responses to one test question, instructors who reviewed their grades on the first few responses and then compared those responses to the others often found reason to alter the original grades. Likewise, instructors who grade assignments made early in term with a close and critical attention which is not reflected in judging later assignments are risking another form of evaluaton bias, particularly if the skills needed for success in the early assignments are not the same as the work assigned later (Hughey and Harper, 1983).

In conclusion, including the assessment of college instructors' evaluation methods as part of a yearly and accumulative formal professional appraisal helps to minimize the difficulties associated with developing fair and equitable evaluation. Applying the strategies discussed above should help identify the sources of evaluation bias and reduce its likelihood, and should produce more reliable and equitable evaluations. If evaluation is the instrument we use to measure how well our students have achieved the intended curriculum outcomes of the courses we teach, improving the quality of that evaluation must be an integral part of better teaching.


Babad, E.Y. (1985). "Some Correlates of Teacher Expectancy Bias." American Educational Research. 22 (2): 175-183.

Burkhart, F.N., and C.K. Sigelman (1990). "Byline bias? Effects of Gender on News Article Evaluations." Journalism Quarterly. 67 (3): 492-500.

"Catonsville Community College Grading Philosophy Survey." (1989). Catonsville Community College, MD. Educational Resources Information Center (ERIC) file ED 311 958.

Chase, C.I., and L.M. Wakefield (1984). "Testing and Grading: Faculty Practices and Opinions." Educational Resources Information Center (ERIC) file ED 256 196.

Clifton, R., and T. Williams (1981). "Ethnicity, Teachers' Expectations, and the Academic Achievement Process in Canada." Sociology of Education 54: 291-301.

Cross, K.R. (1981). Adults as Learners. California: Jossey-Bass.

DuCette, J., and J. Kenney (1982). "Do Grading Standards Affect Student Evaluations Of Teaching? Some New Evidence On An Old Question." Journal of Educational Psychology. 74 (30): 308-314.

Hughey, J.D., and B. Harper (1983). What's In A Grade? Paper presented at the 69th Annual Meeting of the Speech Communication Association, Washington, DC., November 10-13, 1983.

Kemp, J.E., G.R. Morrison, and S.M. Ross (1996). Designing Effective Instruction. Englewood, NJ: Prentice Hall.

Leiter, J., and J.S. Brown (1983). "Sources of Elementary School Grading." Educational Resources Information Center (ERIC) file ED 236 135.

Neff, R. (1989). "Methods of Evaluating Students at the Community College." Educational Resources Information Center (ERIC) file ED 307 936.

Nottingham, M. (1988). "Grading Practices - Watching Out for Land Mines." NASSP Bulletin, 72 (507): 24-28.

Peterson, C.M. (1989). "Simple Strategies for Achieving Equity in the Classroom." Educational Resources Information Center (ERIC) file ED 333 109.

Powell, R., and M. Collier (1990). "Public Speaking Instruction and Cultural Bias." American Behavioral Scientist, 34 (2): 240-250.

Spindel, L. (1996) "Improving Evaluation of Student Performance." The College Quarterly (Spring).

Wells, S. (1994). "The PLA Challenge Process: Recognizing Student Diversity." Paper presented to a Spring 1994 conference of Prior Learning Assessment facilitators at Centennial College, Scarborough, Ontario.

Cathy Coulthard teaches Early Childhood Education at Centennial College in Scarborough, Ontario. Click The Sources Of Bias, to access Part I of this discussion.


• The views expressed by the authors are those of the authors and do not necessarily reflect those of The College Quarterly or of Seneca College.
Copyright ©
1997 - The College Quarterly, Seneca College of Applied Arts and Technology