By Joshua Benton
SAN ANTONIO – It’s within the city limits, but Harcourt Educational Measurement qualifies as its own boomtown.
Like other champions of the No. 2 pencil and the properly filled-in bubble, Harcourt has exploded in size over the last decade thanks to education reform. Its campus, a 560,000-square-foot string of buildings, is barely a year old, and already construction workers are building new offices.
But the real boom, the likes of which few industries have seen, lies ahead.
The federal education bill signed by President Bush in January requires states to test their students in reading and math every year from third grade to eighth grade by 2006. There’s also a requirement for a high school test, and science tests become mandatory for three grades in 2007.
The regimen won’t be a problem for Mr. Bush’s home state of Texas and 15 other states, which already test in all of those grades.
But 34 states have gaps in their testing programs. Some, like Alabama, which is missing only fifth- and seventh-grade math tests, will be relatively easy to fill. Others have lots of work to do. Nebraska, for example, didn’t test math at all last year, and tested reading at only three grades.
Someone has to design, build, refine and grade the dozens of tests that don’t yet exist. And with the testing industry already stretched by rapid expansion – it has gone from a $141 million industry to a $390 million one from 1996 to 2001, according to the nonprofit group Achieve – some are concerned that companies might not be ready to deal with the coming demand.
“I don’t see how they can do it,” said William Koch, a professor of educational psychology at the University of Texas. “I just don’t see it happening.”
Ready-made won’t work
In years past, when states or districts wanted to test their students, they usually went to a testing companies and bought a ready-made exam, such as the Iowa Test of Basic Skills or the Stanford Achievement Test, out of a catalog.
But the federal legislation requires that tests be tailored to states’ own educational standards, which can vary widely. A national off-the-shelf test won’t do. The result is a major challenge for the testing industry.
“There’s never been this kind of demand for new tests,” said Gerald Sroufe, executive director of the American Education Research Association.
San Antonio is an industry center. Harcourt, which produces the Stanford test, controls about 40 percent of the industry. That’s about the same amount as its California-based rival, CTB/McGraw Hill.
“We grow and grow,” said Beverly Nedrow, who oversees Harcourt’s reading and language arts group. She said her staff has nearly tripled in her nine years at the company, and she wouldn’t be surprised to see another doubling or tripling in the next few years.
The growth is easy to see at Harcourt’s sprawling headquarters. If you want to eat at the company cafeteria, be prepared to stand in line for 20 minutes. And if you want a parking place that doesn’t require a lengthy hike, show up early. “If you leave for lunch, good luck finding a spot when you get back,” said Joyce McDonald, director of Harcourt’s Performance Assessment Scoring Center.
To protect test materials, the building operates at top secrecy. Photography is banned in areas where there might be test questions. Security approval is required to enter the large scoring rooms, where at peak times hundreds of scorers sit and read the essays of children. The corporation’s home is marked by only a small sign that gives no clue what goes on there.
Last year, company employees made 49.5 million grading decisions – and that’s not counting all the multiple choice questions graded by machine.
Even that kind of capacity won’t be nearly enough for the millions of additional tests to be administered in coming years.
Bush administration officials have said they expect market forces to address any industry shortcomings before 2006. Small newcomers already have begun to appear, such as San Antonio’s months-old ETS K-12 Works.
But no matter how many new companies are founded, the number of qualified testing professionals is limited. In order to make tests, companies need psychometricians, and they’re in short supply.
Psychometricians design the structure of tests and fine-tune them to ensure they measure what they’re supposed to. They have in-depth training in psychology, statistics, and educational theory, typically through a doctoral program.
Other areas of the testing industry can be scaled upward in size with relative ease. Corporations can always buy more scanners to read bubble sheets. The people who write test questions typically are former teachers, a group of whom there is little shortage.
But it’s difficult to suddenly double or triple the number of psychometricians. There are no more than a few thousand in the country.
“We’re producing about the same number of psychometricians as we used to, but the demand is so much greater than it was before,” said Dr. Koch, who chairs UT’s quantitative methods unit. “All the major companies are desperately looking for qualified people. Anybody who wants a job in the testing industry can get one.”
UT’s program, like many, graduates only two or three psychometricians a year.
“It seems like a fair guess it’ll only get worse,” said Charles Lewis, director of the graduate psychometrics program at Fordham University in New York City.
He said he fears a tight supply of employees will lead to lower quality for the tests that are produced. In the last year, several testing companies have had to fix misgraded exams that in some cases caused students to repeat classes.
“The more tests you produce without sufficient technical support, the greater the chance there will be some low quality tests and there will be a mistake that affects students,” Dr. Lewis said.
With capacity problems looming, states and companies are investigating ways around building new tests for every state. Some are looking into creating consortiums among states with similar standards, so they could join together to build a test of their own. Other states that don’t meet the federal guidelines are
lobbying to be exempted from the requirements.
“We’re hoping we don’t have to change our direction,” said Betty VanDeventer, spokeswoman for the Nebraska Department of Education. Nebraska now allows school districts to determine what tests they will use to meet the state’s minimal testing requirements.
Dr. Margie Jorgensen, a Harcourt vice president, said she expects some states to create half-new tests that combine an off-the-shelf test such as the Stanford with a smaller, state-specific test. California, Hawaii and Delaware already have such a system.
“It’s been a big success for us to be able to test on our standards and compare our students to others around the country” who take the Stanford, said Robin Taylor, Delaware’s associate secretary of assessment and accountability. Using a ready-made test can also save cash-strapped states money.
At the grading end of the process, much of the work is automated. Banks of industrial scanning machines run over the familiar filled-in bubbles on answer sheets and generate scores quickly.
The more difficult problem comes when grading answers that aren’t multiple choice – essay questions or short, open-ended responses. Traditionally, those have required hiring human graders, often retired or vacationing teachers.
But getting qualified graders – willing to work long hours in the short bursts required by testing calendars – isn’t always easy. As a result, companies such as Harcourt are looking hard at artificial intelligence: computer programs that can read and grade essays as though they were human. Dr. Jorgensen said that AI technology has advanced to the point that a computer grader is virtually indistinguishable from a human.
“It feels to me that it’s so close to being doable,” she said. “I think in a couple of years you’ll see AI being used to grade a major test.” Both Harcourt and CTB/McGraw Hill now offer AI grading of essays on selected writing tests.
As the push to 2006 continues, companies are likely to seek out whatever methods they can to meet the demand.
“We’re going to encounter huge deficiencies,” Mr. Sroufe said. “The question is how it’ll be dealt with.”