Commentary / World

Algorithms and charting the great unknown

by Faye Flam

Bloomberg

As algorithms and artificial intelligence infiltrate science, don’t expect bots to replace researchers — but AI may guide scientists and funding agents toward the most promising unexplored territory.

There’s a lot of room for improvement in the way humans are choosing scientific questions, says sociology professor James Evans, who suggested the use of algorithms for guidance in a special section of Science Magazine called “Toward a More Scientific Science.”

In an ideal world, science would work like an efficient search, with researchers fanning out in all directions, a few exploring even the most unlikely terrain, and everyone reporting back — even on the areas where nothing turned up — so as to avoid redundancy. Evans, who works at the University of Chicago, has examined the less-than-ideal way real science works, and he found there’s a lot of redundancy and inefficient clustering of research efforts. Why not try an algorithm to point scientists to potentially rich frontiers?

The same week his idea appeared in Science, a paper in PLOS Biology pointed out that in genetics, scientists are focusing all their efforts on just 10 percent of our 20,000-some-odd genes — wider exploration being discouraged by career and funding constraints. The lead author, systems biologist Thomas Stoeger of Northwestern University, said that some of the widely studied genes are “popular” for good reason, like a known connection to cancer or other diseases. But the attention is still disproportionate, he said, with popular genes getting 4,000 times the attention of others, and some 27 percent of genes remaining completely unstudied.

Some of the bias has historical roots, he said. Genes that are getting the bulk of the attention in 2018 are the same ones that were known before the Human Genome Project identified the full gamut of our genes in the early 2000s. These better-known genes were often the easiest to study for various reasons: It was easy to make and study the protein for which they held the code, for example, or they lent themselves to study in fruit flies and other model animals.

Stoeger said that young scientists who head off into the unknown part of the genome are 50 percent less likely to succeed in becoming independent researchers. It takes a lot longer to get up to speed in new territory, he said. You have to design new tools, such as finding new transgenic animals, in order to get going. Funding agents tend to shy away from very long-term, risky projects.

Human genetics is a little different from many other areas, because the Genome Project made the unexplored part of the genome a known unknown. In other fields, including physics and social science, the unknowns are often unknown. But a more risk-taking attitude on the part of funding agents can help people venture there, too.

Evans, who had suggested the use of an originality algorithm, said it’s hard to get funding agents to understand why they should invest in things that are less likely to succeed. The answer is to take an approach more like venture capitalists do, he said, recognizing that some of the projects with the lowest odds of success might produce the biggest payoffs.

For instance, in biomedical science, he and his colleagues have been able to use algorithms to predict 96 percent of the topics of a given year’s papers from previously published work. Not surprisingly, the papers that had the biggest impact were among the small minority that weren’t predicted.

Scientists cluster into disciplines, with well-established problems and methods for solving them, he said. Disciplines are necessary, “but like any good thing there are diminishing marginal returns.” After a while, those crowded areas become like an over-mined vein.

Too much crowding also contributes to the problem of unreliable results, he said, because different groups often use exactly the same methods. That means that attempts to replicate a discovery are not truly independent.

To make matters worse, scientists often can’t persuade journals to publish negative findings — results that suggest some predicted psychological effect doesn’t exist, a diet intervention doesn’t help people lose weight, or a drug doesn’t help cure a disease. And so other scientists are doomed to do those same experiments again, and if they make an error and get a positive result, that’s what will get into the published literature.

Originality isn’t what people associate with algorithms, but using them might help get funding agents and the people they fund out of their ruts. It fits with a line of thinking that automation, algorithms and AI won’t replace people so much as change the way we work. Evans compares the unknowns of science to a poorly charted sea, with unproductive questions lurking like icebergs. An algorithm might use the few known tips to predict the rest, helping people to avoid the ice and venture further into the unknown.

Faye Flam has written for The Economist, The New York Times, The Washington Post, Psychology Today, Science and other publications. She has a degree in geophysics from the California Institute of Technology.