The Undoing Project: A Friendship that Changed the World(48)

Written By: Michael Lewis

If human judgment was somehow inferior to simple models, humanity had a big problem: Most fields in which experts rendered judgments were not as data-rich, or as data-loving, as psychology. Most spheres of human activity lacked the data to build the algorithms that might replace the human judge. For most of the thorny problems in life, people would need to rely on the expert judgment of some human being: doctors, judges, investment advisors, government officials, admissions officers, movie studio executives, baseball scouts, personnel managers, and all the rest of the world’s deciders of things. Hoffman, and the psychologists who joined his research institute, hoped to figure out exactly what experts were doing when they rendered judgments. “We didn’t have a special vision,” said Paul Slovic. “We just had a feeling this was important: how people took pieces of information and somehow processed that and came up with a decision or a judgment.”

Interestingly, they didn’t set out to explore just how poorly human experts performed when forced to compete with an algorithm. Rather, they set out to create a model of what experts were doing when they formed their judgments. Or, as Lew Goldberg, who had arrived in 1960 at the Oregon Research Institute by way of Stanford University, put it, “To be able to spot when and where human judgment is more likely to go wrong: that was the idea.” If they could figure out where the expert judgments were going wrong, they might close the gap between the expert and the algorithms. “I thought that if you understood how people made judgments and decisions, you could improve judgment and decision making,” said Slovic. “You could make people better predictors and better deciders. We had that sense—though it was kind of fuzzy at the time.”

To that end, in 1960, Hoffman had published a paper in which he set out to analyze how experts drew their conclusions. Of course you might simply ask the experts how they did it—but that was a highly subjective approach. People often said they were doing one thing when they were actually doing another. A better way to get at expert thinking, Hoffman argued, was to take the various inputs the experts used to make their decisions (“cues,” he called these inputs) and infer from those decisions the weights they had placed on the various inputs. So, for example, if you wanted to know how the Yale admissions committee decided who got into Yale, you asked for the list of the information about Yale applicants that were taken into account—grade point average, board scores, athletic ability, alumni connections, type of high school attended, and so on. Then you watched the commitee decide, over and over, whom to admit. From the committee’s many decisions you could distill the process its members had used to weigh the traits deemed relevant to the assessment of any applicant. You might even build a model of the interplay of those traits in the minds of the members of the committee, if your math skills were up to it. (The committee might place greater weight on the board scores of athletes from public schools, say, than on those of the legacy children from private schools.)

Hoffman’s math skills were up to it. “The Paramorphic Representation of Clinical Judgment,” he had titled his paper for the Psychological Bulletin. If the title was incomprehensible, it was at least in part because Hoffman expected anyone who read it to know what he was talking about. He didn’t have any great hope that his paper would be read outside of his small world: What happened in this new little corner of psychology tended to stay there. “People who were making judgments in the real world wouldn’t have come across it,” said Lew Goldberg. “The people who are not psychologists do not read psychology journals.”

The real-world experts whose thinking the Oregon researchers sought to understand were, in the beginning, clinical psychologists, but they clearly believed that whatever they learned would apply more generally to any professional decision maker—doctors, judges, meteorologists, baseball scouts, and so on. “Maybe fifteen people in the world are noodling around on this,” said Paul Slovic. “But we recognize we’re doing something that could be important: capturing what seemed to be complex, mysterious intuitive judgments with numbers.” By the late 1960s Hoffman and his acolytes had reached some unsettling conclusions—nicely captured in a pair of papers written by Lew Goldberg. Goldberg published his first paper in 1968, in an academic journal called American Psychologist. He began by pointing out the small mountain of research that suggested that expert judgment was less reliable than algorithms. “I can summarize this ever-growing body of literature,” wrote Goldberg, “by pointing out that over a rather large array of clinical judgment tasks (including by now some which were specifically selected to show the clinician at his best and the actuary at his worst), rather simple actuarial formulae typically can be constructed to perform at a level of validity no lower than that of the clinical expert.”

So . . . what was the clinical expert doing? Like others who had approached the problem, Goldberg assumed that when, for instance, a doctor diagnosed a patient, his thinking must be complex. He further assumed that any model seeking to capture that thinking must also be complex. For example, a psychologist at the University of Colorado studying how his fellow psychologists predicted which young people would have trouble adjusting to college had actually taped psychologists talking to themselves as they studied data about their patients—and then tried to write a complicated computer program to mimic the thinking. Goldberg said he preferred to start simple and build from there. As his first case study, he used the way doctors diagnosed cancer.

Michael Lewis's Books

The Fifth Risk