Who’s kidding who? What do tests test?

What does a test “test”?

All over the world, at all levels of education, mathematics teachers set tests for their students, and give grades on the basis of those tests.

Tests are usually thought to be reliable indicators (if not measures) of how well students  engage with the material being taught to them, and is often believed to be related in some correlational way with their mathematical ability.

Where a student ranks – their class standing – on a test may well be an indicator of both those things.

I want to argue, however, that what a test really tests is the teacher. A test, in my view and experience, tests a teacher’s ability to set a test at the “right” level.

Typically, students will be awarded a percentage score on a test, indicating the fraction of available points a student obtained as a result of the correct answers they gave.

Standards for grading answers vary widely. Some teachers give partial credit, some do not. Some teachers are lenient in relation to answers that require interpretation, others are not.

A certain number of percentage points is taken to be a pass. Below that critical percentage a student is deemed to have “failed”, above it the student has “passed”.

In many cultures, particularly that of North America, a mere “pass” is regarded by students and their parents as a shameful outcome. In my experience the overwhelming majority of North American students see themselves as A or B students, and certainly not as D students.

Grading standards vary widely in the United States.

One variant, not uncommon in my experience is to convert percentages into letter grades as follow:

A = 90% or higher

B = 80%–89%

C = 70%–79%

D = 60%–69%

F = 0%–59%

The interpretation of these letter grades varies quite a bit.

For example, college students often may not count a grade below C as a “pass” for the purposes of their major, but may count a “D as a “pass” if it is a subject outside their major. For example, many engineering students can carry a D in calculus as a passing grade.

The typical college level passing percentage in Australia is 50%, and in the United Kingdom it is 40%.

Curiously, academic standards in these countries, overall, seem to be in inverse proportion to the passing percentage. The UK has generally higher academic standards than Australia , which, in turn, has generally higher academic standards than does the Unites States.  What I mean by “higher academic standards” is that an any given level of academic attainment one will find UK students to be generally more advanced in knowledge skills and achievement than Australian students, who in turn are generally more able than students in the United States, at the same grade level.

How does a teacher construct a test?

A colleague changed the order of the questions on a test for a course he taught each year, and obtained disastrous results. Typically his students would study tests from previous years in preparation for that year’s test. The mere fact of changing the order of questions was enough to send a number of students into a tail spin.

Teachers can easily set a test for which all but the best and brightest students will fail to pass. This does not require setting material that students have never seen: just pushing the standards a little on reasoning and rigor will generally do it. Or just change the order of the questions from previous years’ tests.

Teachers who do this regularly will be seen by their colleagues as being too tough, unrealistic, and out of touch.

On the other hand it is easy to set a test for which all but the least diligent students will pass, and for which most students will do very well: simply focus on routine calculational questions, all of which have been practiced often beforehand.

Teachers who do this will be seen by their colleagues as too easy, too soft, and not fulfilling their obligations to properly assess students.

So a conscientious teacher wants to steer a course between being overly demanding, and being too soft.

Using various skills and prior experience, therefore, a teacher will generally set a test that has a likely outcome of a reasonable number of students passing, not too many top marks, and a few, but not too many failures.

In other words, what is being tested when a teacher constructs a test is, in fact, the teacher’s ability to set an “appropriate”, or satisfactory, test.

The myth of objectivity

Many people, especially administrators, like test scores because they are apparently “objective”.

This imagined objectivity is illusory for several reasons.

We have already addressed how a teacher needs to carefully construct a test so that not everyone fails, not everyone does exceptionally well, and is such that their colleagues are satisfied they are doing an adequate and appropriate job of assessment.

There is no objective standard for this: it is simply a matter of practice embedded in a particular culture.

Even within a given test culture there is the problem, in testing mathematics, of giving partial credit, of deciding what to do if a student makes a simple calculational error but then proceeds more or less correctly in their application and reasoning following that.

A final grade- percentage or letter – is an average of points obtained as a result of many such decisions.

There is very little that is objective about this process.

To see why, one needs only give the same test papers to a colleague, with the same grading instructions, and see how the awarded grades vary. Even more telling is for a teacher to re-grade test papers a week or so later and see how consistent are the grades.

The pain of believing in the objectivity of test scores

Students who fail to pass mathematics tests usually believe they are failures.

They are not.

They may not have studied all the material at a depth that would make a pass or higher grade likely, they may have been ill for some or all a semester, they may have had family problems, they have have seriously misunderstood a teacher’s intentions and instructions. They may have attention deficit disorder, be highly anxious, or be on medications that affect their ability to focus.

Al of these factors are real and should be understood as real mitigating factors.

Believing in the objectivity of a test and the consequent view of oneself as a failure is highly damaging. It lowers self-esteem and inhibits future effort.

Conversely, students who do study conscientiously  who are relatively free of stress,  and who score highly on tests are very likely to see their success as an intrinsic ability: they begin to see themselves as “A” students. Such students may well have high or even exceptional ability, but a class test is not providing evidence for that. What the test provides evidence for is that some factor, or combination of factors, lead a student to a good outcome. Those factors can include high ability, but almost certainly will include time spent studying, effective study habits, belonging to a study group of peers, asking questions of the teacher, taking notes, listening carefully, a low stress environment, and a dash of luck. Believing in the objectivity of tests leads a student to attribute their success on tests to only one of the factors: ability.

Should we use tests?

I don’t know.

I don’t any more because I see tests as testing my ability to set them appropriately, and not testing the intrinsic ability of students.

I can get an estimate of student work habits from tests, but frankly I’m not much interested in a test to tell me this when I see them regularly in class working on projects. I can see how much work they are putting in: I do not need a summative test to tell me that.

Franky I feel we put too much emphasis on tests and examinations. My own preference is for more collaborative project work in which students can exercise their ability to think, to reason, to plan, and to work toward a goal, utilizing their skills and talents in conjunction with others.

Education, for me, should be a win-win activity, not something we carry out to determine – by phony means in my view – who is a success and who a failure.

 

 

 

 

 

 

 

 

 

 

 

Comments

  1. I believe that tests should be used as a diagnostic tool, to help focus the instructor on what material needs to be covered – and how to do so.

    As a diagnostic tool, tests would have real value if given before the lesson and at a midpoint.

    Furthermore, this should help orient the student onto upcoming instruction. I also think that a final “test” is appropriate to ensure that the students have attained a level off proficiency, but that test should take the form of applied knowledge – rather than regurgitated.

  2. >My own preference is for more collaborative project work in which students can exercise their ability to think, to reason, to plan, and to work toward a goal, utilizing their skills and talents in conjunction with others.

    Have you blogged in detail about how you do this? If you have, can you point me to those posts? If not, would you be willing to write more about that? I have classes with 40 students in them; is this possible with that size of class?

  3. I agree wholeheartedly with everything you say.

    I think a big part of the problem with testing is that it is based on the idea of sampling. We only test some part of what we expect students to know, assuming that the result correlates well with what they actually know.

    Another part of the problem is that our obsession with numerical grades means that we test what is easy to grade numerically. That is, we tend to test the most basic, procedural aspects of mathematics, because testing conceptual understanding, or higher-level thinking, etc., is very time-consuming to grade, and requires high-level graders (perhaps most TAs would not be up to the task).

    But to me, the problems with testing point to a deeper problem: we have courses of fixed length, where we rush students through the same curriculum at the same pace, and then we grade them on how much of the material they have learned in that fixed time. Surely in the 21st century we can do better than this. How difficult can it be to tailor curricula for individual students, or at least small groups of students? How difficult can it be to allow students the time necessary to master the material, so that they can feel good about themselves, and properly prepare themselves for subsequent study and for their careers?

    It would certainly make for a different kind of school, where students are working at their own pace on various different things, helping each other, and being helped and guided by teachers.

    For mathematics, each course can be separated into a technical component (or better, technical modules) and a conceptual component. The technical components, at least the modules that are relevant for each particular student, can be tested on a mastery basis. Students take their time until they have mastered each relevant technical component. It should not be difficult to do such testing automatically using computers, even online. The conceptual component can be assessed by requiring projects, presentations, and so on, and this too can be a process, where students work on their projects until they have polished them to a sufficiently high degree.

    All this, I believe, is congruent with what you say in your post, and would go far in replacing the current inhumane system. It places the focus on working with students to help them all reach a high degree of competence, giving them the time and guidance they need to do so.

  4. What we’d really like to get at is this question: “Is the student compentent?”. Tests sound like the way to go. But tests are time limited, high stress experiences. Solving real life problems does not generally match that description. High stress and anxiety are known to shut down the brain’s memory system.

    What tests can do is help the teacher determine what the weakest areas are for students. Then strategies can be developed to target those weaknesses. Some teachers use quizes for this. These quizes don’t count toward final grades.

    If you must measure student performance, homework is better. One might expect cheating. One course i took (in the late ’70s) approached this by giving a different problem set to each student. A computer solved all problem sets, and included graphs. It was disconcerting to discover how little computer time was required.

    Some teachers mark on a curve. The idea is that it’s not known how good the test is, so we’ll assume that the scores will be gausian, and grade according to the bell curve. But if only the average and above can pass, it’s unfair to the lower half, especially if it turns out that the lower half are competent. That is, there is a disconnect between passing grades and competence.

    One would expect that industry would be good at determining the competence of employees. After all, there’s a monetary incentive. One would be wrong. One of two things happen. Either teams form that help each other out to get things done, or the worst politics evolves.

  5. Your article spurs a variety of feelings in me. I do not consider myself to be a great mathematician, however I do very much enjoy math, outside of school. I am a college student of Computer Science, and through all of my schooling, according to test scores, my math skills are very average, usually a C (I am a North American student), still when I find myself in a situation at work where I need to use my math skills to solve problems of Trigonometry, Algebra or differential or integral Calculus, I do not struggle. I am able to solve such problems with relative ease. I do not believe that this is because the problems are easy or overly simple, and I do enjoy a challenge more than simple “busy-work.”

    In college most of my math grade has usually been based on work – about 60% on average. 30% has been based on test scores, and 10% has been based on whether I show up or not. So even though it would never get me a passing grade, as long as I’m never sick during the semester I can guarantee at least a 10% grade, and if I do the homework, another 60% alone. That would generally, with a flat scoring system, put me at a C average. Now if I actually do all of the homework and I’m there for the test and put any kind of effort into it, I can probably guarantee that I’ll be able to answer at least 80% of the questions correctly. If there are four tests during the semester then I can get about another 24% for my test scores. That would put me at 84% (approximately a B grade). I wish that High School was this easy.

    Thanks for taking the time to talk about this.

  6. I’m an education student thinking long and hard about testing and how I feel about it. I agree with almost everything that you have pointed out but don’t understand why schools are centralized around standardized tests to “see where the students are at”. These tests put high pressure and anxiety on the students (as almost everyone has pointed out), so how do we know that they actually know the material?

    I’m debating how I want to set up my classroom and if tests are going to be present. From my personal education it seems as if tests are mandatory. Are they not? Do administrators state that they are optional anywhere? I guess I just haven’t figured out what I’m allowed to do yet in the classroom.

    Thanks for the assurance that school shouldn’t be all about tests.

  7. I know I am very late at responding to this post but it caught my attention. This is a very interesting subject that has been the topic of discussion in many of my most recent grad classes and grading policy committees at the school where I teach.

    I agree with everyone’s comment that a paper and pencil test does not show exactly what a student has learned. However, when we are talking about this idea of test taking does anyone have a particular age in mind? Are we talking about students in grades K-12 or are we including college students? I ask this because one of our main concerns with teaching kids (I am now specifically talking about grades 9-12) is preparing them for the college – if they so chose. If we do not prepare them with tests then what will happen when they get there and take nothing but tests. Then we did not do our job of preparing them. Are we saying that colleges should change their methods too? Since they are essentially preparing their students for jobs then the answer is yes, of course. How many jobs require their employees to take tests…besides exams for licensure, I cannot think of many.

    My point is – and I hate to be a bit negative – is that unfortunately this problem is larger than one originally thinks. Each grade level is required to prepare students for the next, therefore the changes must occur from first grade until the “last grade.” We, as teachers, must take it upon ourselves to use an appropriate amount of paper and pencil tests as well as project based assessments in order to reach all of our students and prepare them for all of life’s’ challenges/experiences ahead.

Leave a Reply