Twenty-three years ago, I spent a memorable long weekend at the J. Erik Jonsson Center in Woods Hole, a classic old Cape Cod mansion with commanding views of Quissett Harbor and Buzzards Bay. For years this magical spot has belonged to the National Academy of Sciences, which uses it for conferences and research convocations. My companions in 1998 were an eccentric and illustrious committee of psychometricians — scientists who study educational and psychological tests.
The National Academy had assembled this committee to study the uses and misuses of standardized tests. They had been working on the study for two years, and I was chosen to edit their final report because of my having been the editor of the Harvard Education Letter and other publications. We gathered in Woods Hole to review and discuss the resulting volume.
High-stakes tests were being adopted all over the country as the solution to shockingly unequal schools, including in Massachusetts, which created the MCAS in 1993. The tests were being used to decide whether students should be tracked into fast or slow classes, should be promoted to the next grade, and should be allowed to graduate. Our committee looked into all of those practices and their effects on students, both good and bad, with the mind-boggling precision that comes from the combination of supreme expertise and total obsession.
I learned a lot about testing in those years.
Tests had never bothered me when I was a kid, because I was always good at them. And standardized tests have that ineffable aura of exactness with their numerical verdicts. Why do SAT scores range from 200 to 800? Who decided that “800” equals perfection? No matter. We all accepted it unquestioningly.
The most remarkable person on the National Academy testing committee was Sam Messick, a senior scientist at the Educational Testing Service. I will never forget him talking about the principle of consequential validity, which he had originated.
Messick knew better than anyone else all the ways in which tests go wrong. He knew why a child taking two iterations of the exact same test on two different days so often gets two wildly different scores. He knew why tests so often fail to measure the things their creators think they are measuring. Sam said that the only valid reason to test children was to help them. Therefore, any use of a test that hurts rather than helps the child is by definition invalid.
I hear the echo of Messick’s words in this week’s front-page story on testing by Paul Sullivan, who asks, “Is MCAS helping kids or hurting them?”
Sam Messick’s consequential validity argument remains controversial, especially in the multi-billion-dollar standardized testing industry. Numbers retain their power over us, even as we wonder what they really mean.