Recent research verifies that when looking at small differences in student ratings, faculty and administrators (in this case, department chairs) draw unwarranted conclusions. That’s a problem when ratings are used in decision-making processes regarding hiring, reappointment, tenure, promotion, merit increases, and teaching awards. It’s another chapter in the long, sad story of how research on student ratings has yet to be implemented in practice at most places, but that’s a book, not a blog post. Here, my goal is to offer some reminders and suggestions for when we look at our own ratings.
View small differences with a critical eye. The research (referenced here and highlighted in an upcoming issue of the Teaching Professor) explored how faculty and department chairs interpreted small differences in raw mean scores—say, one teacher has a 4.26 on an item regarding the clarity of explanations, and a colleague who teaches pretty much the same course in pretty much the same way has a score of 3.94. That difference does not justify concluding that one of those teachers gives better explanations than the other. The accuracy of those ratings can be compromised by small sample sizes, outliers, misinterpretation of the questions, and less-than-perfect instrument reliability. So while any of those factors may affect the ratings, they have absolutely nothing to do with how the teacher explains things. In this research, faculty and department chairs (at a variety of institutions and across a wide selection of departments) looked at hypothetical scenarios and made judgments in favor of higher ratings when the difference between the ratings was as small as 0.15 of a point on a five-point scale.
Aggregate your data. Look at your scores on individual items across different courses, taught during different semesters, and then look at overall ratings of a particular course across multiple sections over several years. Then, if you see trends, you can think about drawing conclusions.
Look at how the instrument is defining good teaching. Rating instruments operationally define teaching. If they ask questions about giving lectures and about being organized, prepared, and fair, the instrument deems those items to be important characteristics of teaching. How closely does that definition correspond with how teaching is being defined in your practice? We can’t usually modify or stop using the instrument provided by our institution, but we’re almost never prevented from soliciting feedback on a set of items that may be more relevant to our conceptions of good teaching. That doesn’t mean that we’re free to define good teaching any way we please. Research has established that some teacher characteristics and some approaches promote learning better than others.
Work for a realistic perspective on the results. Some of us (I’m tempted to say lots of us) are too vested in the numbers. We have to care; the numbers are used for and against us. Nonetheless, we can’t let their importance blind us to what they’re providing. Feedback on end-of-course rating instruments offers a view of your teaching. It’s not a 360-degree panorama, but rather something closer to the view provided by a small window. And if the instrument isn’t very good, it’s like looking through a dirty window. For years, most ratings experts have advised institutions and individuals to collect data from multiple sources and in different ways. We don’t need just one view. We should be looking at our teaching from every window in the house.
Don’t look at the results for answers; use them to generate questions. So you’ve got a data set. What questions does it raise? What tentative explanations need further verification? If some of the data conflicts, that’s the easiest place to start asking questions.
Let others help you find your way to the answers. Start with students. Tell them you’ve got some questions about the results; you need more information. They can provide valuable feedback if they’re asked focused, specific questions, and they will do so if they think you care about their answers. Colleagues can also help us with accurately interpreting the results. They can tell us if they think we’re overreacting to negative feedback. They can tell us if a conclusion we’ve reached seems justified by the results. They can offer alternative explanations and brainstorm potential solutions.
What are some strategies you use to help maintain some perspective on your rating results?
Reference: Boysen, G. A., Kelly, T. J., Raesly, H. N., and Casner, R. W. (2014). The (mis)interpretation of teaching evaluations by college faculty and administrators. Assessment & Evaluation in Higher Education, 39 (6), 641-656.
This Post Has 2 Comments
The averages of these ordinal measures should never be used as cardinal measures. It is the distribution of the comments that should be considered. If you are using a five-point scale, differences that come about because of one prof getting more 4s and fewer 5s than another do not likely signal any difference in teaching quality. However, if a teacher regularly has more than 10% of the class rating performance with 1s and 2s, then there is likely to be an issue worth more attention. Student comments can be useful indicators of what is going on. Peer observation is a useful tool as well.
Pingback: Teaching Tip: The Problem with Student Evaluations | Jay and Barry's OM Blog