College Quarterly
Spring 1997 - Volume 4 Number 3

Reducing Bias In Evaluation, Part I: The Sources of Bias.

by Cathy Coulthard

Evaluation is an essential component of teaching and learning, and of the curriculum process. If we view curriculum as intended learning, then evaluation is the instrument we use to measure how well the learner has met the intention. Evaluation should effectively assess the degree to which a learner has acquired the knowledge, skills and attitudes expected from the curriculum outcomes. Thus, the quality of evaluation is an integral part of teaching and imparting knowledge and skills to students (Neff, 1989).

In most instances, instructors are solely responsible for the evaluation and grading of students who are taking their courses (Neff, 1989). Ideally, if the process of evaluation truly measures learning achievement, any instructor will award exactly the same grade as any other for a particular assignment. Of course, this is seldom true, and experience suggests that even in the best of circumstances, developing consistently fair methods of evaluation remains a difficult task for many instructors.

Constructing tests and other forms of evaluation which adequately reflect intended learning is a difficult task, and to assume that classroom teachers routinely do so finds no support in the research literature (see Nottingham, 1988; Hughey and Harper, 1983). Because instructors are most often the sole determining influence on the evaluation of students enrolled in their courses, the grades awarded will reflect the instructor's habitual methods of judgment.

The research on evaluation shows that there is no consistent pattern in instructors' approach to this task. Some teachers tend to expect more of their students than do others. Some judge student work severely on early assignments and more leniently on later assignments, while others do exactly the reverse (Hughey and Harper, 1983). Chase and Wakefield (1984) found that the design of student projects and tests through which student achievement is assessed is largely a matter of individual instructor choice. In sum, the accuracy and fairness of evaluation can be limited by inconsistency within and across classroom experience, by a lack of instructor objectivity, or by an inadvertent introduction of evaluation bias.

The literature also shows that assigning grades to students' schoolwork is inherently subjective (Nottingham, 1988), and that few college faculty members have had specific instruction in the design of reliable and methodologically valid evaluation methods (Chase and Wakefield, 1984). Most instructors do not receive training in test construction, administration, grading, and other forms of student evaluation (Neff, 1989), despite that college faculty are expected to devote considerable time to these tasks.

Some education in what is good or bad about particular evaluative strategies is provided by teachers' own many years of experience as students undergoing evaluation. But absent formal training in evaluation, many faculty draw mainly on that experience and commonly use the methods which they understand to have been employed by their own instructors to re-create evaluation strategies by imitation (Neff, 1989).

Often, that replication of traditional methods fails to address the issues which have lead to inequity in evaluation. While many faculty believe that the same products and standards should be used for all students in their classes (Chase and Wakefield, 1984), the common determinants of student grades can variously include the assessment practices of the instructor, interaction between the instructor and student, and the instructor's knowledge of the student (Hughey and Harper, 1983).

In the Catonsville Community College Grading Philosophy Survey (1989) a majority of faculty surveyed indicated their belief in assessing students according to common outcomes and standards. In suggesting that equivalent products and standards should be required across multi-section classes, many respondents recognized the inherent subjectivity of grading and evaluation processes, and the persistence of bias in evaluation.

A theoretical awareness of the evaluation bias problem must be accompanied by a practical understanding of its causes and through that understanding, by efforts to develop and maintain reliable and equitable evaluation methods. Most educators are familiar with the notion that faculty ignorance of or inattention toward the diversity of the population of learners in their classrooms will corrupt student-teacher perceptions, interactions, and expectations. Insensitivity to the implications of that diversity in and across classrooms is an invitation to misevaluation.

Misevaluation may result from the potentially skewing effects of student race and gender, student reputation, and student deportment in class (Leiter and Brown, 1983) and these variables can be cumulative and interactive in producing bias. Research in this field has clearly established that however inadvertently, such bias easily influences presumably objective processes of evaluation (see Babad, 1985) and has found that student grades are "... affected markedly by the cognitive and normative expectations that teachers have for students" (Clifton and Williams, 1981 cf. Leiter and Brown, 1983: 3).

How are those expectations typically determined? We know that messages attributed to sources identified as highly credible tend to be perceived as "more fair" and "more justified," and thus often have more impact on opinion, than do even the identical messages when attributed to individuals of low credibility (Burkhart and Sigelman, 1990). In a study of teacher expectancy bias, a sample of teachers were asked to assign a grade to identical papers written by a "Daniel Cohen," identified to half of the teachers as an "excellent student" and to the other half as a "weak student." The resulting grades clearly demonstrated an expectancy bias due to the ability label (Babad, 1985).

This adds weight to the argument that student reputation and classroom deportment can significantly effect the summative evaluation determined by an instructor. Instructors might give lower grades to students who challenge discipline standards, who question commonly held viewpoints, who do not appear to be interested nor choose to be involved in activities organized by the teacher, or whose frequent absences are deemed to betray a lack of commitment to schooling.

It seems reasonable to assume that these complaints will be cross-reinforcing and thus amplified in effect, that they will color the interactions between individual teachers and students, and that they will negatively alter the perceptions of those students by other teachers who formally or even through casual collegial interaction, are made aware of the students' reputation. In these circumstances, teachers can be expected to award grades on the basis of observed or reported student behaviors and imputed attitudes which frequently have little to do with measuring what the student has learned.

How a faculty member assesses variables such as "participation" is an item of considerable controversy in itself (Chase and Wakefield, 1984). It is arguable that "participation" has become a useful synonym for what it often measures, that is "attitude." The Catonsville survey (op. cit.) found that even while recognizing the problems of subjectivity and evaluation bias, a significant number of faculty members cling to continued use of such variables as "participation" and "attitude" as devices for manipulating grades, although the assessment of these activities has been shown to be purely subjective, typically unrecorded, and usually not indicative of behaviors which are intended to be altered by instruction in the subject matter.

Supposing that most faculty are at least familiar with the notion that "participation" and "attitude" measures are inherently unreliable and potentially unfair, it seems a reasonable supposition that they persist, at least in part, because they are handy ways to respond subjectively to imputed student attitudes, if not behavior. Some evidence suggests the temptation to use grades as weapons is often overwhelming (Nottingham, 1988) when students disrupt the classroom, evidence apathy, or challenge authority.

Relatedly, Chase and Wakefield (1984) found that students' social skills, verbal fluency, and physical attractiveness are often influential in awarding grades. All these influences are cumulative and interdependent, and they are further compounded when teachers try to make their grades conform to previous grades, in which case instructors' expectations of students can significantly distort the evaluation of homework assignments and examinations (Leiter and Brown, 1983).

Ultimately, all the characteristics which influence a student's reputation, and which are consequently reflected in evaluation, can change the student's self-perception and be reflected in behavior. Students soon learn what is expected of them and adjust their attitudes and classroom behavior accordingly (Neff, 1989), and negative expectations can result in a vicious cycle of diminished learning and misevaluation.

Instructor response to diverse student populations' age, gender, race and educational background also can create bias in evaluation. Some teachers continue to maintain, racial, ethnic, or gender prejudices (however subconsciously-held, or even seemingly benign) which inevitably alter perception and change expectation, and are thus a significant influence on grading (Leiter and Brown, 1983). Reliance on such factors is now seldom explicit, but the manifestation of prejudice need not be overt or direct. For example, Burkhart and Sigelman (1990) established that in judging students' credibility, teachers can be susceptible to both race- and gender-based characterizations which (while perhaps otherwise important to the design of curricular and teaching strategies to account for diverse needs) are largely irrelevant to evaluation.

Another source of evaluation bias is the choice and form of words used in evaluation methods, usage which can be misleading or unfamiliar to students. This bias is often unintentional, and the instructor may be entirely unaware of its effect. Of course, the choice of evaluative language should avoid references that students reasonably could find offensive, references to unfamiliar items or ideas, and stereotypical representations (Wells, 1994). But even the most seemingly innocuous usage can cause difficulty. For example, a student responded to this author's test question by asking why in its premise, a childcare center was reported to have mailed menus to every child's parent(s), when the purpose of the test question was to elicit how many weeks of menus should be posted to adequately inform parents.

That illustration supports the argument that the cultural background and communication experiences which students bring into classrooms may be a fundamental predictor of success in a course (Powell and Collier, 1990). For example, many common themes used as examples in evaluation reflect traditional white male interests such as sports (Peterson, 1989), or otherwise employ metaphorical references which might seem obscure or unintentionally give offense to students of particular cultural backgrounds. Usage which is simply unfamiliar is as potentially disruptive to evaluation as that which is offensive, and the former might be harder to recognize than the latter. Thus, instructors should carefully consider both the clarity of the language used in their test questions, and those questions' apparent purpose.

Bias can also take the form of the time allocated to the evaluation procedure. Cross (1988) suggests that adult students have slower reaction times, therefore slowing down the learning process. Experience suggests that this might be even more significant for "nontraditional" older students who might also experience additional language or physical differences. (Many of this author's own Early Childhood Education students are over 25 years of age and have English as a second language.) While a nontraditional classroom population is a good indicator that extra care might be required, it is always prudent to consider that the format of the evaluation is unlikely to be uniformly familiar to students from different age, culture, gender and educational backgrounds.

Similarly, it is in itself a bias to assume, as previously noted, that the same products and standards should be used for all students in the class. Although instructors usually make assumptions about their students' prior knowledge instructors and students might not share the same knowledge base and students' prior knowledge and experience of evaluation varies considerably among individuals (Catonsville grading survey, op. cit., 1989; Chase and Wakefield, 1984). For instance, students under 25 who have completed their highest level of education within this country and who speak only English at home report having a broader range of experience with all types of testing and evaluation (Wells, 1994). That familiarity can provide both advantage for them and relative disadvantage for others, a potential inequity which requires remedy in the design of demonstrably reliable and useful evaluative methods.

As to that utility, evaluation bias clearly has important implications for curriculum. That is because evaluation is an essential component in the curriculum process, and the instrument we use to measure the degree to which students have acquired the knowledge, skills and attitudes expected from the curriculum Anticipating the difficulties associated with developing effective and equitable evaluation is only a beginning. Further examination of the strategies and skills required to overcome these barriers will be included in Part II of this discussion, Strategies to Overcome Bias, forthcoming in The College Quarterly.


Babad, E.Y. (1985). "Some Correlates of Teacher Expectancy Bias." American Educational Research. 22 (2): 175-183.

Burkhart, F.N., and C.K. Sigelman (1990). "Byline bias? Effects of Gender on News Article Evaluations." Journalism Quarterly. 67 (3): 492-500.

"Catonsville Community College Grading Philosophy Survey." (1989). Catonsville Community College, MD. Educational Resources Information Center (ERIC) file ED 311 958.

Chase, C.I., and L.M. Wakefield (1984). "Testing and Grading: Faculty Practices and Opinions." Educational Resources Information Center (ERIC) file ED 256 196.

Clifton, R., and T. Williams (1981). "Ethnicity, Teachers' Expectations, and the Academic Achievement Process in Canada." Sociology of Education 54: 291-301.

Cross, K.R. (1981). Adults as Learners. California: Jossey-Bass.

DuCette, J., and J. Kenney (1982). "Do Grading Standards Affect Student Evaluations Of Teaching? Some New Evidence On An Old Question." Journal of Educational Psychology. 74 (30): 308-314.

Hughey, J.D., and B. Harper (1983). What's In A Grade? Paper presented at the 69th Annual Meeting of the Speech Communication Association, Washington, DC., November 10-13, 1983.

Kemp, J.E., G.R. Morrison, and S.M. Ross (1996). Designing Effective Instruction. Englewood Cliffs, NJ: Prentice-Hall.

Leiter, J., and J.S. Brown (1983). "Sources of Elementary School Grading." Educational Resources Information Center (ERIC) file ED 236 135.

Neff, R. (1989). "Methods of Evaluating Students at the Community College." Educational Resources Information Center (ERIC) file ED 307 936.

Nottingham, M. (1988). "Grading Practices - Watching Out for Land Mines." NASSP Bulletin, 72 (507): 24-28.

Peterson, C.M. (1989). "Simple Strategies for Achieving Equity in the Classroom." Educational Resources Information Center (ERIC) file ED 333 109.

Powell, R., and M. Collier (1990). "Public Speaking Instruction and Cultural Bias." American Behavioral Scientist, 34 (2): 240-250.

Spindel, L. (1996) "Improving Evaluation of Student Performance." The College Quarterly (Spring).

Wells, S. (1994). "The PLA Challenge Process: Recognizing Student Diversity." Paper presented to a Spring 1994 conference of Prior Learning Assessment facilitators at Centennial College, Scarborough, Ontario.

Cathy Coulthard teaches Early Childhood Education at Centennial College of Applied Arts and Technology in Scarborough, Ontario. Click Strategies to Overcome Bias to access Part II of this discussion.


• The views expressed by the authors are those of the authors and do not necessarily reflect those of The College Quarterly or of Seneca College.
Copyright ©
1997 - The College Quarterly, Seneca College of Applied Arts and Technology