Scaling

A Quickly-Written Essay on Scaling

Part 1. A Few Opening Quotes

Just to get us started, let me quote a few things from the syllabus that I provided at the very beginning of the course:

"Grades will be scaled in order to maintain equity among sections and to conform to University, Faculty, or Department norms."

"The primary function of grades is to inform you (and other people) as to your performance relative to other students taking the course."

"In order for grades to serve this function, it's important that average performance is reflected in an average grade (something in the C range), that better-than-average-but-not-great performance is reflected in a better-than-average-but-not-great grade (something in the B range), and so forth."

"The Faculty of Arts is very concerned about "grade inflation" and has set very firm guidelines for appropriate distributions of grades in courses at all levels. This is something we really pay attention to in the Psychology Department – especially in courses like PSYC 217 for which there are multiple sections taught by multiple instructors. So, for this course, the expectation is that the average final grade will be around 65 (that's a C+) and will be normally distributed around that mean (producing just as many failing grades as A's). I am compelled to construct, mark, and/or scale the exams and assignments in such a way to ensure that the distribution of final grades in this class meets these guidelines."

Part 2. What Grades Mean (and What They Don't)

So why do you get grades anyway? Here's the reason: So that when you apply for a scholarship, or for a job, or for graduate school, or whatever, the folks who are examining your application have some means of comparing you to the many other folks applying for that same fellowship, job, or grad school position. In short: Grades are designed to reflect your performance relative to your peers.

That last phrase ("relative to your peers") is key. Grades are not designed to convey, in any absolute sense, just how much knowledge or skill you have. In fact, there's no way they can accurately convey that. Absolute levels of knowledge and skill are impossible to measure, impossible to quantify. Nor is any absolute level of knowledge or skill especially relevant to most of the decisions (e.g., hiring for jobs, funding of scholarships, admission to graduate school), that people make based on your grades. What people mostly want to know isn't how well you've performed, period. What people mostly want to know is how well you've performed, relative to your peers.

Here's an example. When I review applications for folks applying to grad school in psychology, I care about a student's overall grade point average. I don't want that GPA to tell me the percentage of items a person got right on, say, an economics exam; I don't even want that GPA to tell me the percentage of items a person got right on a psychology exam. (After all, unless I taught the class myself, I don't have any basis for judging the meaningfulness of those numbers.) Those numbers alone aren't particularly helpful in discriminating great students from good students, or good students from average students. When I look at a person's GPA, I want those numbers that tell me, more specifically, how well this student did in each class, relative to his or her peers.

For this reason, the raw performance on any exam (e.g., the percentage of items you get right or wrong) is merely the starting point for helping determine your actual mark on that exam. But it ain't the ending point. It provides the basis from which your mark can be figured out. That eventual mark should be designed to tell you, and other people, how well you did relative to other people in the course.

Part 3. Why There Exist Norms and Standards that Govern Distributions of Grades

Assigning meaningful grades would be easy if every student on the planet (or even every student at UBC) took exactly the same courses, with exactly the same instructors, same paper assignments, and same exams. But that doesn't happen. Not every student who takes PSYC 217 has the same instructor or same exams. (There are multiple sections, taught by different people.) This makes it especially important that grades reflect relative performance, rather than simply raw scores on exams. And it means that there must exist some sort of broad institutional norms about how grades are assigned, and what those grades actually mean.

Across different universities, numerical grading scales differ widely. (At the University of Alberta for many years, numerical grades have been recorded as 9, 8, 7, etc. At most American universities, scores in the 90's are A's, scores in the 80's are B's, scores in the 70's are C's, and so forth. Here at UBC we have our own unique – and weirdly non-linear – system that differs from just about every other university on the planet.) But, happily, there is a cultural convention to use letter grades in a commonly-understood way: Grades in the A range are expected to designate truly superior performance (relative to other students in a class), grades in the B range are expected to signify good performance (relative to other students), grades in the C range are expected to signify average performance (performance at roughly the class average), and so forth.

(There also has been some slippage – some tendency toward grade inflation – over the years; and so these days, in classes like PSYC 217, average performance is expected to result not in a grade in the middle of the C range, but rather in the upper realm of the C range. And in some classes – typically upper-level courses that draw on a more select sample of students – the average performance may actually be signified by something in the B range.)

It is in recognition of the necessity for a common standards of grading, that the UBC Psychology Department (like most departments on campus) follows some fairly strict guidelines for the assignment of grades. These guidelines appeared in the syllabus, and they dictate what the distribution of final grades must look like for this class. Among other things, these guidelines dictate approximately what the average grade should be in the course (an average in the high C range), and what the distribution of grades should be around that average (a normal distribution, or "bell curve," with grades spreading out symmetrically on both the upper and lower ends of the distribution).

Part 4. What This Means I Must Do

I gotta follow these guidelines.

And because I gotta follow them, I make sure to tell students right from the beginning of the course that I'm going to follow them. That's why I made this clear in the syllabus, and that's why I talked about it on the first day of class.

(But perhaps I should have spent a lot more time driving the point home. I sorta assumed that because you've presumably taken introductory psychology, and that grades are scaled in that class, that you're familiar with scaling policies and their implications. I'm now thinking I assumed too much. Hence this hastily-written essay.)

There are a variety of different ways to make sure that the grades in this class follow those guidelines. One way is to design graded material in such a way that it results, semi-magically, in a distribution of raw scores that fits perfectly the expected distribution of grades. Ideally, when making written products (e.g., papers), the marking scheme that we use will result in a set of scores that conform perfectly to the grading guidelines established for the class. And, ideally, exams are constructed in such a way that the distribution of raw scores on those exams conforms perfectly to the grading guidelines established for the class.

In fact, though, that rarely happens. Especially with exams. It truly would be almost magical if it did happen with an exam – especially a relatively brief multiple-choice exam.

But that's okay, because those raw scores don't have to have any ultimate meaning all on their own. They are just the starting point. And there are lots of scaling mechanisms that we can use to transform any distribution of raw scores (no matter how high or low the average might be, no matter how non-normal the distribution might be around that average) into a new set of numbers that meet the distributional requirements for actual grades.

And so, that's why we use these scaling procedures.

(In a lot of classes, a scaling procedure may happen more-or-less "behind the scenes." I prefer to be up-front about it. Especially for a class like PSYC 217, because the issues involved are relevant to research methods in psychology.)

Part 5. Scaling Up and Scaling Down

People often talk loosely about "scaling up" and "scaling down." From a purely logical perspective, those terms are sorta meaningless. Scaling is scaling. Regardless of what the percentage of items on an exam you get right or wrong, the whole point of scaling is to arrive at a set of numerical values (scaled marks) that (a) conform to the grading guidelines, and (b) fairly represent each student's performance relative to the performance of others in the class. Whether any particular student's mark appears to go up or down depends on what the distribution of raw scores looked like in the first place. Rarely does any sensible scaling procedure simply add some constant value to every student's raw score (e.g., add 5 points to every student's raw percentage), or subtract some constant value (e.g., subtract 5 points from every student's raw percentage). Those sorts of crude adjustments are rarely sufficient to meet the desired distribution of marks. In order to satisfy the goal, many scaling algorithms end up adjusting some student's scores more than others.

Is that fair? Yes – as long as the following objectives are satisfied for all students: (1) Your scaled mark is exactly the same as the mark assigned to any student whose raw performance on the exam was identical to yours; (2) your scaled mark is higher than the mark assigned to any student whose raw performance was worse than yours; (3) your scaled mark is lower than the mark assigned to any student whose raw performance was better than yours. As long as those objectives are met, then the distribution of scaled marks does fairly each student's performance relative to the other students in the course.

Part 6. Logic versus the Psycho-logic of Scaling

You might have read all this way and still be thinking "That all sounds very cool and calculating and logical, sure. But: Logical schmogical! Something still feels crummy about all this." Fair enough. So let me step off of this highfalutin pedagogical platform and adopt the student's perspective for a moment.

First, I'll point out that, in addition to the stuff I talked about already, scaling is designed to protect the integrity of a student's grades from the idiosyncrasies of instructors and the exams they give. Since it's near-impossible to create an exam that produces raw scores that conform perfectly to the distributional guidelines for grading, it means that instructors are likely to gonna give an exam that either (a) produces scores that are, on average, too low, or (b) produces scores that are, on average, too high. If you got an exam that was a real ass-kicker, in which the average percentage right was, say, 55%, you'd be howling and demanding some adjustment to the marks. In essence, you'd probably be saying, "Those raw scores are not diagnostic! You should scale these scores so that my performance is more fairly judged relative to my peers in the class." And you're right: I should be doing that; I should be scaling the grades. Of course, exactly the same logic applies in the opposite situation. If I give an exam that turns out to be too easy (and produces a set of raw scores in which the average percentage right is, say, 80%), I should be doing exactly the same thing: I should be scaling the scores so that your performance is more fairly judged relative to your peers in the class.

(That's what I did for the midterm in our class this term. I followed exactly that logic.)

But again, that's just logic. And while students often find the logic of scaling very appealing when they've just been collectively bloodied by an overly-harsh ass-kicker of an exam, there's a different set of considerations afoot when the opposite is true.

One consideration, of course, is this: You want your eventual grade to be as high as possible. That desire, that motive, leads to a different psychological response to different scaling situations: Scaling feels like a perfectly fair and logically-required form of relief when marks are "scaled up", and scaling feels like unfair punishment when marks are "scaled down." I hope that some part of the preceding essay help you reckon with any sense of being unfairly punished, if you're feeling that way.

And maybe this will further help: Let's do a "thought experiment":

Imagine first that, on a 50-item exam, the average raw score was 40 (a raw percentage of 80%), and that your score was right at the class average: 40. Because of the scaling procedure, that raw score of 40 turned into a scaled mark of 68. (A score that is actually a bit higher than demanded by departmental guidelines, but the instructor just couldn't bear to scale things "down" any more than that.)

Now imagine that the 50-item exam was a real ass-kicker, and the average raw score in the class was a 30 (a raw percentage of 60%), and that your score was – again – right at the class average. The instructor scaled scores "up" so that the overall class average was right at the average recommended by Departmental guidelines: 65. So, your score of 30 turned into a scaled mark of 65.

You'll notice that the first scenario matches very closely the situation that you encountered for the midterm in our class. And if you're that student, it may indeed feel lousy to have your mark scaled all the way "down" to a 68. It may feel unfair to you. On the other hand, if you're the student in the second situation, you're probably not gonna have any complaints about the scaling procedure – if feels fair and justifiable because your grade got scaled "up" to a 65. But, if you take the long view, which mark would you ultimately rather have, a 68 or a 65?

I don't know about you, but I'd rather have the 68. The point of this little exercise is to distinguish between the psychological ramifications of scaling, and the actual meaningful ramifications in terms of the grade that it ultimately leads to – because it is that grade (and not the psychological feelings) that is show up on the transcript that folks look at when you apply for jobs, scholarships, graduate schools, and that sort of thing.

Part 7. Motivation

An additional issue has to do with the motivational consequences of all this. Some students find it de-motivating when their marks are scaled "down." ("Why should I bother to study if my grade is just gonna be scaled down anyway?") For this reason, one might argue that instructors should always give ass-kicker exams, so that when they scale grades they always scale "up." The trouble is, many students find it de-motivating when they get an ass-kicker exam. ("No matter how much I study, no matter how much any of us study, we all end up doing so bad on the exam that the marks have to be scaled up! Why should I bother to study at all?") There's no perfect solution to this problem.

Except, of course, to magically create exams that obviate the need to do any scaling at all. And, that truly is hard to pull off.

Another solution, I suppose, would be to not do any scaling on any exams or assignments throughout the course – to just provide students with unscaled raw scores – and then just do some final scaling at the very end, after the final exam. The trouble with that option is that students may be misled about what their level of performance actually is, and they may suffer bad consequences as a result. (E.g., if I got 30/50 on the ass-kicker exam, I might be tempted to just drop the course because I'm thinking that I'm destined to get an unacceptably low grade in the course – unless I was given the very useful information that my performance was actually right around the class average. Or if I got 40/50 on the overly-easy exam, I might be duped into thinking that I'm well on my way to getting an easy A, and therefore taking it easy the rest of the way – when in fact I'm hovering right around C+ territory.) To avoid the risk of misleading students about where they stand in the class (in terms of their performance relative to their peers), I'd rather apply scaling procedures on every exam and every paper.

So, how do you keep your motivation up. Again, I'm hoping that some of the stuff I've written in this little essay might help.

The key take-home message, ultimately, is this: For better or for worse, your marks on exams and papers – and your final grade in this course – will fairly reflect your performance relative to your peers in this course. To the extent that your performance is better than the majority of them, you will get a better-than-average grade. To the extent that your performance is better than an even greater number of your fellow students, you will get an even better course grade.

And, more generally, throughout your university career, your GPA will reflect how well you perform in your classes, in comparison to the other students who take those classes. The more hours you study, the more intrepidly you apply yourself to those courses, the more likely you are to do better than your peers.

That message, I hope, will help keep you motivated.