COLUMN: Elite schools make room for mediocre rich kids

By Joshua Benton
Staff Writer

Page 2B

If you or your child is applying to a selective college this year, here’s a reading assignment: Pick up a copy of The Price of Admission, a new book by Wall Street Journal reporter Daniel Golden.

It’ll either give you a useful view into how the elite admissions game works or just leave you disgusted about the whole enterprise.

Actually, probably both.

Mr. Golden’s subject is the root unfairness in the way elite colleges choose who wins the coveted spots in their freshman classes.

Some folks complain about admissions policies that favor minority students. But Mr. Golden shows the degree to which the bias actually moves in the opposite direction: toward children of privilege.

We all know wealthy kids have enormous advantages not available to others. Their parents can afford score-boosting SAT prep classes and private school tuition. They can give their children an upbringing that provides endless educational opportunities. Those can all give the rich an edge.

But I’m not talking about those kids – the ones who, even considering their privileges, earn their spots. I’m talking about kids who aren’t remarkably bright but still get into top colleges because of who their daddy is.

The most obvious way that’s done is by legacy preference, the edge that colleges give to the children of their alumni. It’s probably the most effective way colleges encourage – some might say extort – donations from their former students.

For instance, at Harvard the admissions rate for legacies is four times the rate for the hoi polloi. Is it because those kids are unusually smart? Nope – they actually have lower average SAT scores than other admitted students.

Mr. Golden, himself a Harvard alum, details the ways colleges chase after the children of the rich and powerful, like paparazzi pursuing Paris Hilton.

He shows how Al Gore’s son earned a questionable admission to Harvard, and how presidential niece Lauren Bush got into Princeton despite below-average SAT scores, mediocre grades at her Houston prep school and not bothering to apply until a month after the deadline.

I’d like to see a working-class kid from South Dallas try that trick.

Actually, North Texas, home to more than its fair share of rich folks, shows up a few times in Mr. Golden’s narrative. Members of Fort Worth’s Bass family have given tens of millions to their alma maters, and that’s helped when it comes time for their children to apply. Mr. Golden reports that one Bass daughter got into Stanford despite being in the middle of her own high school class and having an SAT score that ranked her deep in the bottom quartile of Stanford freshmen.

Mr. Golden writes about how, beginning in the 1970s, Duke – which comes out of this book looking awful – targeted the wealthy parents of Dallas prep schools because the university was looking for rich families to turn into donors, no matter how mediocre their kids’ academic records were.

“We really worked Dallas,” a former Duke associate director of admissions told Mr. Golden. It was all part of Duke’s hunt for members of the “socioeconomically high-end.”

And for the rich legacies who still can’t sneak into a school, there’s often a back door. Harvard, for instance, maintains something called the “Z-list” for students who can’t survive the normal admissions process. They’re told they can enroll if they just wait a year. Not so coincidentally, about three-quarters of the students on the Z-list are legacies.

If this seems like a personal issue to me, it’s probably because it is.

I went to Yale. Some might call that casting against type. I grew up in a poor small town in south Louisiana. No one in my family had ever been to college, and most hadn’t graduated from high school. It took $100,000 in grants, $16,000 in student loans, and a couple campus jobs to make Yale affordable.

I knew some of the people Mr. Golden is talking about. The prep-school kids with B-minus minds. The ones whose last names were on campus buildings.

They were a small minority of the student body, most of which was awe-inspiring. But there were some I couldn’t stop comparing to the brilliant kids who I knew had gotten rejection letters.

I enjoyed my time at Yale, and I wouldn’t mind if my kid went there someday. But Yale, with its endowment of $15 billion, doesn’t need my money. It’s depressing how many of my classmates preach the need to donate cash – not out of affection for their alma mater, but solely so they can be labeled a “productive alum” and someday get their own kids into Yale.

Is any of this really surprising? I mean, isn’t it a given that connections matter, that a kid whose last name is Bush, Bass or Kennedy is going to have an edge?

I suppose. But America’s elite colleges make such a fuss about their high-minded meritocracy that it’s disgusting to see them dance like eager courtiers.

The American model is supposed to promote social mobility, not an inherited aristocracy. College admissions is a zero-sum game. For every C-student rich kid who gets into Harvard, there’s a far more qualified middle-class kid who gets stuck with his safety school.

And those spots in the freshman class are more sought after than ever. When I applied to Yale in 1993, the university admitted 19 percent of all applicants. Today, it’s closer to 9 percent.

Elite schools, including Yale and Harvard, have made efforts in the last few years to increase the number of low-income students they attract – mostly by offering more generous financial aid packages.

But as long as they keep holding the door open for the middling children of aristocrats, they’ll be blocking the path for everyone else.

Cheating: It’s in the numbers; Formulas can provide powerful evidence of misconduct on exams

By Joshua Benton
Staff Writer

Page 1A

It’s the sort of case you might expect Encyclopedia Brown to tackle.

Two kids seem to have cheated on Professor Harpp’s final exam. Can he prove the culprits did it – before it’s too late?

But when McGill University professor David Harpp suspected some of his students were up to no good, he didn’t hire a boy detective for a shiny new quarter. He did the job himself.

He devised a statistical method to determine whether two students were copying test answers from each other. He found that, on a 98-question multiple-choice test, the pair of students had 97 answers exactly the same – including 23 wrong answers.

Confronted with the evidence, the students confessed.

To the untrained observer, it may seem strange that cheating can be reliably detected with statistics, formulas and math, as Texas officials have hired an outside firm to do. But decades of research around the world have produced methods that prove quite effective at smoking out cheaters in ways even the best proctors often can’t.

“We had always worried that cheating was happening, but we had to find a way to figure out who was doing it,” said Chris McManus, a professor of psychology and medical education at University College London who hunts cheaters.

In Texas, the test-security firm Caveon has identified 699 Texas schools – nearly 10 percent of the state’s total – where cheating may have occurred. The Texas Education Agency is planning how it will deal with those schools, some of which will be the targets of a full state investigation.

Caveon used several methods to look for bad behavior. But the most common problem it found was classrooms or schools where a group of students had identical or nearly identical answer sheets, suggesting they may have copied from one another.

That method of detecting cheating has a long history and, according to researchers in the field, does a good job of identifying the right suspects. And those researchers say Texas could do a better job of preventing students from cheating in the first place.

Many techniques

Academics have come up with dozens of methods, dating back nearly a century. They differ in details, but nearly all are founded on one key principle: It’s rare for a pair of students to make exactly the same mistakes on a multiple-choice test.

Having lots of identical correct answers, of course, doesn’t raise red flags. If the correct answer to Question 21 is “B,” you’d expect many students to choose it. But students who get many questions wrong in exactly the same way – particularly if few other classmates made the same mistakes – can be a sign something is up.

The earliest known statistical test for cheating was explained in a 1927 scholarly paper. During an exam, four students were observed acting suspiciously. The instructor thought to examine the number of wrong answers the four had in common with each other on the 149-item test.

The suspects had 31, 28, 25 and 17 wrong answers, respectively, in common. The average for the rest of the class: only four.

The students denied cheating at first, but three “quickly confessed guilt when confronted with the evidence,” the paper’s author wrote.

Dr. Harpp is a chemistry professor at McGill, one of the most prestigious universities in Canada. He didn’t know much about cheating research in 1989, when one of his students told him that two peers had shared answers on a final exam.

“I felt I was being ripped off, which I was,” Dr. Harpp said.

The student worried about being a rat. But he agreed to tell Dr. Harpp the suspects’ names on the condition that no disciplinary action would be taken based solely on his accusation. Dr. Harpp had to find a statistical way to detect copycats.

He and a colleague, Jim Hogan, wrote a computer program that, for every possible pair of students, compared the number of identical wrong answers with the number of questions the pair answered differently. Sure enough, the results for the two students stood out.

But the analysis found more – 18 suspect students in all. Dr. Harpp then obtained exam seating charts from the registrar’s office. It turned out that every suspect pair had sat together on test day.

“I was flummoxed,” he said. “I thought: What am I going to do?”

He gathered data from dozens of other exams and found the same patterns. His method consistently found that the results for between 3 percent and 8 percent of students were suspicious – a range that is common in cheating research. And in each instance, the pairs were seated together.

The results were considered strong enough that McGill now uses Dr. Harpp’s method to check for cheating on all the university’s multiple-choice final exams.

“We want to create an ethos of academic integrity on campus,” said Morton Mendelson, McGill’s deputy provost. “Our goal is not to catch cheaters. Our goal is to persuade students they really do want to be honest. And that if they’re not honest, they will be caught.”

British research

Like Dr. Harpp, Chris McManus does his cheating research as a sideline to his main academic work, which is training British doctors. For a study published in The British Medical Journal last year, Dr. McManus looked at more than 11,000 people who had taken the postgraduate exams given to aspiring British pediatricians.

The exams were taken at dozens of testing sites in Britain and around the world, on many dates. The study used a computer program of Dr. McManus’ design, Acinonyx. (Showing no field is immune from puns, the name derives from the Latin acinonyx jubatus, the scientific name for the common cheetah.)

The analysis found 13 pairs of answer sheets that suggested copying. The computer had no way of knowing where or when the tests were taken. But test records confirmed that all 13 pairs had taken the examination at the same locations and on the same days.

Seating charts had been kept for only six of the 13 pairs, but they showed that all six were sitting next to each other on test day. It was slam-dunk evidence of cheating.

“We went back and sat in some of the rooms, and it was very obvious that one could cheat if one wanted to,” Dr. McManus said.

Supporting evidence

The great risk of a statistical tool is the possibility of false positives. The best way to protect against them is to seek supporting evidence, such as seating charts. Texas does not require schools to record where students sit on test day, nor does it mandate how they must be arranged.

To guard against mistakes, researchers with different methods compare their findings to test their validity. While little is known about the specifics of Caveon’s methods in Texas, most approaches have broad agreement on the students they identify.

“All of these methods are very much consistent for students with a high level of suspicion,” said George Wesolowsky, a professor of management at McMaster University who has created his own formula. “It’s on the marginal pairs where methods might disagree.”

For instance, Dr. Harpp has run much of his testing data through a popular formula designed by a professor at Virginia Tech. The two methods are substantially different, he said, but they nearly always flag the same students as suspicious.

“We wound up with the same conclusions the vast majority of the time,” he said. “I can’t say it’s totally foolproof, but it’s pretty, pretty close.”

Steps working

At McGill, Dr. Harpp has encountered many cheating schemes in the years since 1989. One echoed the famous restaurant shooting scene in The Godfather: A student recorded an exam’s answers on his graphing calculator, then hid it in a bathroom for another student to retrieve.

But the number of cheaters showing up in Dr. Harpp’s statistical samples has plummeted over the past decade. That doesn’t necessarily mean students have become more honest. Instead, the university has used Dr. Harpp’s findings to change the way final exams are administered.

First, student seating is assigned randomly on exam day, so students can’t pick their neighbors. That almost completely eliminates cooperative cheating, in which one student agrees to secretly supply answers to a friend.

In some cases, exams in different subjects are given in the same room, with each subject assigned to alternating rows of desks. That way, the people seated next to you aren’t even being tested on the same subject.

Second, professors prepare different, scrambled versions of their exams. The questions are identical, but they appear in a different order. That makes a glance at another student’s answer sheet useless to an aspiring cheater, because his neighbor’s Question 6 may be his Question 39.

“A scrambled exam isn’t that difficult to do,” Dr. Harpp said. “The temptation to cheat is removed, in a quiet, dignified way.”

Since those reforms were put into place, answer copying has nearly disappeared from his statistical samples, he said.

At McGill, Dr. Harpp said, a statistical finding is considered strong evidence in disciplinary proceedings. But for a student to earn official sanction from the university, it generally must be supported by other evidence such as a seating chart showing the students next to each other.

Going unpunished

But not every place is as aggressive as McGill. On the British pediatrics test Dr. McManus studied, none of the students identified as cheaters were punished – despite strong statistical evidence and proof the offenders were seated next to one another.

“Colleges are very unwilling to act on the data,” he said. “If you don’t get a confession, there’s not much you can do about it.”

One problem: Knowing that two answer sheets were identical doesn’t tell you whether both parties were in on the cheating. Did one simply have a good view of an innocent’s answer sheet?

Students are also quick to argue that their similar wrong answers are simply the result of studying with their friends – or being taught poorly by the same teacher.

“That’s a crock,” Dr. Harpp said. If that were true, he said, you’d find suspiciously similar answer sheets for friends sitting apart as often as for friends sitting together.

That’s not what the numbers show, he said. Best friends, identical twins, and even a student taking the same exam twice – they all produce answer sheets different enough to avoid a flag, as long as they aren’t sitting next to each other.

In addition, few organizations that rely on testing are excited about uncovering cheaters in their midst. Dr. McManus initially wanted to study a different medical exam. But the organization that produced it was unwilling to let the results be published, he said, forcing his shift to the group that gives the pediatrics test.

“Most exam boards see it as an indictment of themselves,” he said. “They don’t like to publicize it.”

It can be an uphill battle to convince people that statistics can spot human foibles, often better than human investigators. “If you’re trying to investigate immoral or illegal activities,” he said, “people don’t tell you the truth.”

SIDEBAR: Crunching numbers to find Texas cheats; Firm focuses on schools with high number of suspicious results

By Joshua Benton
Staff Writer

Page 8A

Caveon, as a for-profit company, has declined to reveal how, exactly, it does its work.

“Companies don’t publish much about their methods, because then everyone could do it,” said Chris McManus, a professor of psychology and medical education at University College London who has researched cheating.

But Caveon’s report to state officials offers clues to how the company crunches its numbers.

It appears that, like many other cheating researchers, Caveon focuses on wrong answers, not correct ones. If two brilliant students both got perfect scores on the TAKS, Caveon wouldn’t consider that suspicious – even though, by definition, all their answers would be exactly the same.

An appendix to Caveon’s report says that the company calculates the probability that pairs of students would have the same answers if they had acted independently. That’s similar to a portion of the method used at Canada’s McGill University. But it’s unclear how similar two students’ answer sheets must be to trigger Caveon’s suspicion.

There’s one big difference between Caveon’s methods and those of most academics in the field. Technically speaking, Caveon isn’t searching for students who might be cheating. It’s searching for classrooms and schools where an unusually high number of students might be cheating.

Caveon’s analysis expects classrooms to have a certain number of students with very similar answer sheets, based on statewide averages for the grade and test. The number varies by grade and test. In third-grade math, for instance, Caveon expects about 1.2 percent of answer sheets to look suspicious. In 11th-grade math, that number is 6.1 percent.

Caveon flagged a school or classroom only if it had many more suspicious answer sheets than other schools in the state. For instance, an 11th-grade math class where 6 percent of answer sheets were exactly identical wouldn’t be flagged – because that would be merely average in Texas.

Similar answer sheets are the most common suspicious pattern Caveon flagged in schools, but the company also screened for three other kinds of potential problems:

* Schools where students had unusually large jumps in test scores.

* Schools where answer sheets had unusually high numbers of erasures that changed wrong answers to right ones.

* Schools where students had unusual answer patterns, such as answering difficult questions with ease but missing easy ones.

SIDEBAR: How wrong answers point to wrongdoing

By Joshua Benton
Staff Writer

Page 9A

McGill University professor David Harpp has come up with one of the more straightforward statistical methods for teasing out cheaters. Here’s how he might catch two students, Jack and Jill, who are copying answers off each other on a 100-question multiple-choice exam. Let’s say both got C’s on the test – Jack a 75, Jill a 72.

Step 1: Determine how many questions the students answered differently.

Jack and Jill missed roughly the same number of questions – but that doesn’t mean they missed the same questions. Let’s say that there were five questions Jack answered correctly that Jill missed; two questions that Jill got right that Jack missed; and three other questions that both missed but in different ways. That would equal 10 questions answered differently.

Step 2: Determine how many questions the students answered incorrectly and identically.

Upon examining the answer sheets, it turns out that Jack and Jill had 20 questions they both answered incorrectly – and in the same exact way.

Step 3: Determine the ratio between the two numbers.

The magic formula is EEIC/D. That means “exact errors in common divided by differences.” In this case, that would be 20 exact errors in common divided by 10 – a ratio of 2.0.

In Dr. Harpp’s analysis, anything over 1.0 is considered highly suspicious. To decrease the chance of a false positive, a school could use a higher cut score, like 1.2 or 1.5. But using either setting, it looks like Jack and Jill were cheating.

Step 4: Determine the probability that students could produce such similar answers independently.

It’s possible that the professor who wrote the test simply did a bad job. If he wrote a few questions poorly, he might have unwittingly pushed many students to choose the same wrong answers – which could artificially inflate the ratio in Step 3.

So Dr. Harpp checks to make sure that the wrong answers selected by Jack and Jill were statistically unlikely – in other words, that most other students weren’t fooled into answering the same wrong way they did. That calculation (too complex to include here) produces a measure of how unlikely Jack and Jill’s answer patterns would be, based on how other students answered.

If the calculation shows the chances that the strange answer patterns occurred naturally are very small – about 1 in 30 million or more, Dr. Harpp says – Jack and Jill will get called to the dean’s office.

TEA may ax test analyzer; Agency doubts level of TAKS cheating; evaluator defends data

By Joshua Benton
Staff Writer

Page 1A

The Texas Education Agency is leaning toward severing ties with the company it hired to look for cheating on the TAKS test, in part because the results have generated negative publicity for the state.

The agency also has some concerns about some methods used by the company, Caveon, officials said.

“I don’t have a lot of confidence in them anymore,” state Education Commissioner Shirley Neeley said. “Right now, I’m sure not inclined to ask Caveon for anything anymore.”

TEA hired the company after a series of stories in The Dallas Morning News that found evidence of educator-led cheating on the 2003 and 2004 Texas Assessment of Knowledge and Skills. Its job was to use statistical analysis to identify schools where cheating might have occurred on the 2005 administration of the test.

Caveon flagged 699 schools, 171 in North Texas, as suspicious in one way or another – for example, because some students’ test scores shot up too quickly or because a group of students had identical or nearly identical answer sheets. That’s nearly one-tenth of all Texas schools.

In response, TEA created a task force on test security, added to its test-security staff and has announced plans to investigate all 699 schools to varying degrees.

But Dr. Neeley and other state officials have repeatedly said they had not planned to investigate any of the schools and that they have done so primarily in response to media coverage of Caveon’s findings.

“It’s how it’s been misconstrued that’s the problem,” said Robert Scott, deputy commissioner of TEA. “The statistical analysis may be fine. But the implications have been ‘everybody’s cheating.’ ”

Even though investigations are coming, state officials have said that Caveon’s methods are not reliable enough to evaluate the test scores of individual students and were intended to uncover “anomalies,” not cheating.

“Is it worth the trauma to put districts through an investigation if a flag ultimately doesn’t turn up anything?” agency spokeswoman Debbie Graves Ratcliffe said.

The agency has declined to give investigators Caveon’s detailed findings, such as how many students in a given school are suspected of improper acts and what was suspicious about their answer sheets.

The agency has said it believes Caveon’s claims can be fully and fairly investigated without knowing which students are under suspicion. Dr. Neeley said Caveon’s findings were never intended to be broken down to the student level.Experts who study cheating, however, say such data would be key to determining what happened on test day.

Firm backs work

Caveon, in its original contract for the work, said it would provide valid results for individuals. And both in its contract and in its report to the district, the company expressed confidence in the accuracy of its findings.

According to Caveon’s contract, its duties were to provide “summary and detailed results” that include “cheating and piracy activities by individual examinees,” “the incidence of test fraud/theft by classroom and school,” and “anomalous test results in schools that are most likely due to cheating by test administrators or outside sources.”

And in its report, Caveon terms its methods “a very conservative statistical approach” that ensures that schools flagged “will be so anomalous that reasonable explanations of these inconsistencies by referring to normal circumstances become improbable.”

Don Sorensen, Caveon’s vice president of marketing, said Friday that the company would have no comment on its work with Texas. A representative of Pearson, the state’s testing contractor, also declined to comment.

Methodology criticized

Criticisms of Caveon’s methods have centered on how it detects which schools have made suspiciously large gains in performance.

Caveon uses statewide data to determine how big a jump in TAKS performance a student typically has from year to year. Then it filters out students whose gains were more than a certain amount above that state average. Schools that have too many of those students get flagged as suspicious.

But because it uses the same standard for all schools, Caveon’s method puts additional scrutiny on high-achieving and high-wealth schools. Students at those schools tend to have higher gains from year to year than schools with lower performance.

The result is that Caveon flagged a large percentage of the high schools in well-off suburbs – schools where students generally achieve high TAKS passing rates without having to resort to cheating. Some superintendents have said they don’t trust Caveon’s gain-score methodology.

Caveon also flagged some schools for having unusual numbers of erasures on their answer sheets, where disproportionate numbers of wrong answers were changed to correct ones. But Ms. Ratcliffe also said that’s not necessarily a sign of improper activity.

“There’s nothing illegal about erasing answers,” she said.

However, the validity of another flag that Caveon used to detect possible cheating – schools where very similar answer sheets suggest students copied answers off one another – is well established in academic research.

What’s next?

Cutting ties with Caveon would leave the agency with several options. It could stop looking for cheaters through statistical analysis altogether. It could try to find another company to provide the services. Or, Ms. Ratcliffe said, the agency could try to perform the statistical analysis in-house using TEA staff.

In any event, the delay will probably push back the timetable for examining scores from the spring 2006 TAKS tests.

The current investigations into the 2005 TAKS will be looking at tests given 18 months ago, long after many memories of improper activity have faded. Agency officials had said they hoped to reduce that lag time with the 2006 tests.

But Ms. Ratcliffe said that a decision on who, if anyone, will analyze those tests probably won’t be made until at least October, when the test-security task force meets again and considers the state’s options.

Breaking ties with Caveon would seem to conflict with the agency’s original stated plan, which was to get multiple years of the company’s analysis before considering whether it was worth taking any action based on Caveon’s findings.

Meanwhile, the agency has begun investigations into “close to 20 schools” based on Caveon’s report, Ms. Ratcliffe said. Those investigations began before the commissioner’s task force on test security had its first meeting last month, she said.

Ms. Ratcliffe would not say what those investigations entailed or how the schools were selected – for example, if they were the schools set to receive state money from a special incentive program for high test scores.

“It’s a mix, schools we had some concern about,” she said. “It’s part of the first wave. We have a lot more work to do.”

The agency has not yet announced which of the Caveon schools will receive on-site visits as part of its investigation.