Testing the Test
A 16 year-old gets ready to get her driver’s license. She successfully gets through the eye exam and passes the actual driving part, although she’s a little shaky on parallel parking. So far, so good. But on the written portion, she flunks entirely—she can’t identify a single traffic sign. The state decides that while she didn’t pass, she’s quite likely to learn the information later and would probably pass the test another time. While she doesn’t get her own license, the driving school who produced her lacking skills gets to count her as a success.
It’s also rather similar to the state Education Agency’s latest testing policies, which allow schools to count students who fail a state exam—but are likely to pass later. For instance, at Houston’s Benavidas Elementary, kids who pass the state reading and math exams only need to do is show up to their writing test. Even if they get no credit for their writing attempts, they’ll be projected to pass the writing exam, and therefore count towards the school’s passing rate in writing. Depending on the combination of scores and the school’s history, students can get literally zero points in the state exam and their school can count the students in the passing category. They can thank the Texas Projection Measure. (Read our earlier coverage of the measure here.)
In Texas, a single set of standardized tests—the TAKS—serve to answer a wide range of questions, from how much a single student learned to whether the school as a whole adequately educates minority and at-risk kids. Each year, the Texas Education Agency rolls out the latest TAKS results, and the news is generally good. More kids are proficient in math, writing, you name it—this year there was improvement in every grade except sixth and eighth. But whether the tests actually reflect high standards is a different question—and one lawmakers are raising. After all, last year there were only 43 school districts with top ranking. Now there are 117, and all but one of the new districts acquired its status thanks to projections that its student will pass. Call it predicted excellence. As more and more rides on test-performance, the state has implemented different strategies to showcase different statistics. Rep. Scott Hochberg, D-Houston, argues it masks the problems in the system.
“You set out what you call a ‘tough accountability system,’ and then you find a dozen ways to make it weaker,” says Hochberg.
Hochberg, who chairs the appropriations subcommittee on education, has made it his mission to iron out some of the biggest concerns. His primary target: what counts as “passing.” Schools, obviously, get credit for each student who passes. But there are some handy loopholes that help boost a school’s rating for students who aren’t quite at the bar. Thanks to the Projection Measure, used last year as part of a voluntary federal pilot program, schools can also count students who fail a test but are projected to pass later based on their school and their other scores. And while a school must also show passing rates for subgroups like minority and at-risk students, those districts that have a lot of different subgroups—Black fourth graders, at-risk fifth graders, etc.—can still get credit even if one group isn’t performing up to standards. (Some districts can discount as many as four sub-groups.)
According to Prof. Ed Fuller, the latter’s not such a terrible thing. “If you have 25 cells, and one is low performing, does that really mean your whole school is low performing?” he says. “Not so much.”
Little comfort to Hochberg, who sees the Agency’s policy as a contradiction to legislative goals. “You can take a mulligan on a few of them!” he exclaims. “That was never the intention.”
But it’s on the projection policy that Hochberg created a buzz amongst education experts. At an appropriations subcommittee meeting on education last week, the infamously nerdy legislator came in armed with data to question Education Agency officials. (The agency’s assessments expert, Criss Cloudt, did not respond to interview requests.) While currently the TAKS only test broad subject areas (like reading and math) as opposed to specific subject areas (like Texas History or Algebra), it’s still not easy to figure out the projection measures. Hochberg spent time testing schools in his district with different combinations of scores on the tests, discovering that in some schools, kids who failed certain tests miserably still counted as “passes” based on the projection models. He had expected to ask Education Commissioner Robert Scott the questions, but when Scott did not appear at the meeting, Hochberg pounced on Cloudt and the agency’s chief operating officer, Adam Jones. While he hammered the officials on the statistical basis for the projection measures, Hochberg found little comfort in the testimony.
“I’m starting to get to the point where I don’t trust any assertions coming from the agency,” he says a few days after the hearing. “…As I walked out of the hearing, the administration was still adamantly defending the system as it stands. I didn’t hear any recognition by [the agency] that there was anything wrong with the current system.”
He points to the moment when he asked Cloudt how accurate the projection measures actually are. “I think they’re quite good,” she said, pointing out they’re right 90 percent of the time. It only took Hochberg a few seconds to find the flaw in her logic. Kids who are already safely passing or clearly failing are easy to predict—they’ll likely continue to pass or fail (whichever the case may be.) But those kids aren’t as affected by the projection measure.
“The part that you’re using for accountability purposes are the kids that are below standard that your projecting upward to meet standard.” Hochberg said. The ones on the borderline between passing and failing.
The rate for those kids turns out to be much lower, between 52 and 80 percent. “The place where you’ll have your most inaccurate projections are right around the passing,” Cloudt acknowledged under Hochberg’s questions. In other words, the calculator’s main use is in predicting the performance of those borderline students, where the calculator is least accurate.
The current system certainly has already been under fire recently. The Houston Chronicle’s Rick Casey slammed the agency for the projection measures and, previously, for the fact that students can pass some of the TAKS tests answeringonly 44 percent of questions correctly. Hochberg isn’t the only lawmaker worked up about the news.
“The idea that we’re gonna inflate a school’s ranking based on what we predict what they’re going to do the next three years into the future,” says San Antonio Rep. Mike Villarreal. “That’s just crazy!”
Villarreal worries that the data is becoming less and less useful, and Fuller echoes the sentiment, concerned that the tests themselves are becoming less effective measures. He says the projection measure that Texas uses doesn’t consider how the students actually do or how much they improve from one year to the next. And anyways, he says, rewarding schools for percent passing isn’t the best measure. A school that gets a lot of border-line students to answer a few more questions correctly and pass the test gets credit from the agency while a school that dramatically increases the number of questions students get right, but doesn’t get the students up to passing gets almost no credit. “The school with the greater increase in passing doesn’t have greater growth,” Fuller points out. Based on the agency’s measure, “it doesn’t matter how much better they do,” he says. “It’s just: did they get over a certain hurdle?”
The agency, on the other hand, isn’t offering any apologies for the current system. While they acknowledged Hochberg’s arguments and did not argue with his facts, the staffers did not waver in support of the system and stood firm that their data was valid. In fact, while Hochberg hopes the agency will suspend using the measures, Cloudt alluded to their usefulness when the agency begins to implement a new battery of tests in 2012.
“At the end of the day,” chief operating officer Jones told the lawmakers, “these numbers are driven by performance.”
And it’s that line of thought that leaves Hochberg concerned. “The argument that I guess was being made [by the agency] was, ‘Well this wasn’t perfect but it was the best we could do under the circumstances,’” Hochberg says a few days after the hearing. “And that’s not good enough. This is far from perfect—it’s seriously flawed.”
Updated to reflect new information.