Beyond Test Scores: Improving Research Evidence on Education

May 11, 2016

The central question in education policy—How well are schools preparing students for their futures?—cannot be answered by looking at test scores alone.

Last month, the Journal of Policy Analysis and Management released our paper (with Tim Sass of Georgia State and Ron Zimmer of Vanderbilt) that provided the first large-scale evidence of the effects of charter schools on the earnings of their graduates, years after graduation. Using statewide data from Florida, we found that students who attended charter high schools not only experienced higher graduation rates and higher rates of college entry (relative to a comparison group), but that they also had higher rates of college persistence and higher earnings in their mid-20s.

These findings were interesting, not only because they provided new evidence on charter schools’ long-term effects, but also because the positive effects on educational attainment and earnings would not have been predicted from the students’ test scores. The same charter schools that seem to be producing positive long-term effects did not have positive effects on short-term test scores.

Findings of positive long-term effects—even, in many cases, without positive test score effects—have been observed in other studies of small, mission-driven high schools of choice, such as New York City’s small high schools; Washington, DC’s, voucher schools; and Catholic high schools. All of these schools appear to be helping their students build skills that have substantial long-term benefits that are not captured in test scores. Interestingly, the pattern of long-term benefits without persisting test score effects has also been observed in research on the effects of high quality early childhood programs.

All of these findings suggest that, in matters of education policy, improving research evidence will require attention to a wider range of outcomes than test scores alone—at least until we have tests that can effectively measure the “noncognitive” skills and behaviors that are necessary for success in school, work, citizenship, and life. (I confess that I sometimes envy my colleagues who work in health research, where what constitutes a favorable outcome is more often well defined and uncontroversial than in education.)

Specifically, researchers (and policymakers and educators) need measures that predict long-term outcomes but that don’t require waiting many years to collect. Even though I’m proud of our new study of the long-term effects of charter schools, it is unfortunate that the field had to wait over 20 years since the initiation of charter schools to produce such evidence. The students included in our study enrolled in high school between 1998 and 2001—quite a long time ago. It would be useful if future studies could examine predictors of long-term outcomes for current students, potentially making findings much more timely.

Fortunately, measures of noncognitive skills and behaviors for current students are receiving considerable attention from researchers. Policymakers are paying notice as well. In the new Every Student Succeeds Act, Congress modified the outcome-based accountability requirements of the old No Child Left Behind Act, giving states new authority to couple test-based measures with another, unspecified measure of student success or school performance. A group of school districts in California has already chosen to add student “grit” to the test-based outcomes it measures in its schools. 

Unfortunately, even the scholar who has brought grit to the forefront of public discussion (the University of Pennsylvania’s Angela Duckworth) acknowledges that current measures of grit are flawed. More generally, the field does not (yet) have many good measures of noncognitive skills that can be implemented at modest cost, that produce results consistent across schools and student populations, and that reliably predict students’ long-term outcomes. For example, as others have pointed out, existing measures of grit are plagued by “reference bias”: students tend to rate themselves relative to others around them, which undermines comparisons across schools.

In sum, the new attention to noncognitive measures of students’ skills and behavior is well supported by a growing body of research that demonstrates substantial long-term effects that are not strongly related to test scores. We researchers have a lot more work to do to improve research evidence on educational interventions and outcomes.

Read more about our education work.


The opinions expressed on the Evidence in Action blog are those of the author(s).

Recent Comments

Join the conversation: You can register for an account to comment on Evidence in Action. Log in to comment through this account or through Facebook, Twitter, LinkedIn, or Google+.

Log in | Register

View the comments policy

Evidence in Action: Comments Policy

We encourage comments on the Evidence in Action blog—all viewpoints are welcome. Commenters can register through our simple form to create an account for Evidence in Action. Commenters can log in through this account or through their social media accounts. Comments are moderated, and we reserve the right to edit, reject, or remove any that include off-topic statements or links; abusive, vulgar, or offensive content; personal attacks; or spam. Those who violate this policy will be blocked from commenting in the future.

Users who log in through a social media account will be identified by information associated with that account (i.e., a Twitter handle or the user name registered with a Facebook, LinkedIn, or Google+ account). Your comment will not include links to your social media account. Mathematica will not post to your social media account.

Feel free to email us with any questions.