It’s become a kind of sport to shoot down social science claims, whether it’s the notion that you can ace interviews if you stand like Wonder Woman or charm your next date by reading two pages of “Moby-Dick” before you leave.

And now critics have taken aim at a prize target — a much-cited claim that symphony orchestras hire more women when they audition musicians behind a screen. There are big implications here, since the study has been used in diversity efforts across industries, which is why the takedown has taken off in the media.

But the blind auditions won’t go the way of the other results that have vanished into air upon a more critical analysis. One reason is that blind auditions really exist; they were not a contrivance set up by scientists in a lab, as with the studies that have become infamous in the so-called replication crisis. Those mostly relied on experiments from which researchers made oversized and often counterintuitive claims. Some, it turned out, incorporated errors in statistical analysis that made random noise look like surprising new findings.

In contrast, blind auditions were independently adopted by real orchestras, starting with the Boston Symphony Orchestra, in the latter part of the 20th century. The purpose was to prevent conductors from choosing their own students, or their personal favorites, and instead force them to focus entirely on the music. It’s also been adopted for awarding astronomers time on the Hubble Space Telescope — a limited resource that has only gone to a small fraction of astronomers who submit proposals.

In the 1990s, two economists — Claudia Goldin, an economics professor at Harvard, and Cecilia Rouse, now an economics professor at Princeton University — set out to investigate whether blind selection in orchestras was the direct cause of a concurrent increase in the number of women hired for orchestra positions.

Goldin and Rouse went around the country to different orchestras to observe their auditioning practices and collect data on past practices as well as records of who auditioned and who got hired. Much of that data was buried in files in basements. They learned interesting things on the journey — including the fact that some orchestras used carpeting or other measures to disguise the difference in sound between male and female footsteps.

The results, published in 2000, were complicated. There are different rounds of selection — preliminary, semifinals and finals, and women did better in blind selections in some rounds but not others. This was reflected in the abstract of their paper, which admits up front that their data are noisy and some of their numbers don’t pass “standard tests of statistical significance.”

In an interview, Goldin said that they were particularly interested in seeing what happened to the subset of people who applied to both blind and nonblind auditions. Asking people to audition behind a screen might bring in a different, more diverse group of applicants, she said, but there were some musicians who applied to both kinds. Comparing how they performed in blind versus nonblind auditions would offer a kind of natural experiment. And that’s where those controversial numbers surface.

The paper says that, “using the audition data, we find that the screen increases — by 50 percent — the probability that a woman will be advanced from certain preliminary rounds and increases by severalfold the likelihood that a woman will be selected in the final round.” The results were cited by politicians and TED talk speakers, and often referenced by other researchers.

One of the critiques came from Columbia University statistics professor Andrew Gelman, whose blog posts have become known for identifying and explaining the kinds of statistical errors — or cheats — that have led to erroneous or misleading conclusions in social science and medical research.

He criticized the lack of clarity in the paper, writing that he could not figure out how they calculated the much-touted 50 percent figure, let alone the severalfold difference mentioned, so it was impossible for him to see whether these numbers stand up to statistical tests.

That’s a fair criticism. But even if their data were too noisy to determine that blind auditions increased female hires, that doesn’t prove that there’s no effect, or that discrimination didn’t exist. Goldin said that their number comes from isolating just the cases where the same people applied in both kinds of auditions, and applies, as the paper says, only to certain stages in the process.

A similar study of Hubble access time got a comparable result. When identifying information was removed from proposals, women became more likely than men to get approved — for the first time in the 18 years the data were tracked. As described in detail in Physics Today, the blinding also resulted in more time going to researchers from lesser-known institutions. Reviewers had to look at the substance of the proposals in more depth rather than relying on the track record of the proposers. A third study looked at coders and found that in gender-blind submissions, women’s code was more likely to be accepted than men’s; but when the coder’s gender was known, women’s code was accepted less often.

We shouldn’t lump a study that examined decades of hiring data at real orchestras in with the headlines that oversold findings that forced smiles make you happier, that hearing words associated with aging make you walk more slowly and that women are much more likely to vote for Republicans at certain points in the menstrual cycle.

Unlike those other disappearing findings — which blustered about a whole new understanding of human nature or offered people too-easy-to-be-true life hacks — this blind audition paper was modest, claiming only to shed light on a cultural phenomenon at a particular place and time. There’s no reason to throw it into the trash heap of bad science.

Faye Flam is a Bloomberg columnist.

In a time of both misinformation and too much information, quality journalism is more crucial than ever.
By subscribing, you can help us get the story right.