NEW YORK - The news that researchers could use Facebook feeds to “predict” whether you suffer from any of 21 medical conditions came across as impressive, but also unnerving. The conditions included some potentially embarrassing ones, such as sexually transmitted diseases, several that might put people at risk of discrimination, such as depression and psychosis, and pregnancy, which some people might feel should be their news to tell.
But don’t worry. Big Brother doesn’t really know about that infection.
Once you get past the headlines, the actual scientific data showed only that the researchers could guess better than blind chance when given access to people’s Facebook feeds — not predict that you had any particular condition. For just 10 of the 21 conditions, the Facebook feeds were more informative than basic demographic data such as sex, ethnicity and age.
And for those worried about privacy and the mysterious tentacles of Facebook, it’s reassuring that the predictive power worked only on big sharers. The study, carried out by researchers at Penn Medicine and Stony Brook University, used 999 volunteers, but these were whittled down from a group of more than 1,700 when it turned out a good fraction of them didn’t share enough information.
For the remaining volunteers, the researchers compare Facebook-based guesses with what actually ailed people according to their electronic medical records. The results were published last month in the journal Plos One.
The guesses were based on the frequency with which the volunteers used certain words — which was kind of interesting. People who used a lot of unprintable slang terms for genitalia and sex acts were more likely to have been diagnosed with sexually transmitted infections, which isn’t really that mysterious. More surprising was the fact that people who made the most use of the words “God” and “pray” were more likely to have Type 2 diabetes.
Researchers too often assume causal relationships when it seems vaguely plausible — including cases where prayer was associated with lack of a disease. Here, it seems unlikely that posting about God or prayer causes disease. It might be, instead, that there’s a connection between religious tradition and food tradition, and a connection between food traditions and Type 2 diabetes. Or maybe diabetes and discussion of prayer are more common in the same parts of the nation, for unrelated reasons.
So why was the news presented as a big deal? For one thing, it’s easy to make your prediction sound impressive or scary by comparing it to something that’s not very predictive.
In a press release, one of the researchers boasted that their analysis was more predictive of Type 2 diabetes than was body mass index, but the latest evidence suggests body mass index was overrated as a predictor of health problems.
The press release also made much of the fact that the Facebook feeds did better than ordinary demographic data for almost half the conditions studied, but demographic data isn’t that high of a benchmark.
Researchers did something similar several years ago to build up claims that the company Cambridge Analytica could glean information from Facebook feeds and use it to outperform people’s friends in predicting results of personality tests — information that could help the company manipulate audiences with political ads. It sounds scary, but friends are not that good at predicting such results. People are not that good at it even for themselves.
The technique used for the health condition study might still prove useful for discerning patterns in public health.
“What’s exciting about this is that it provides the first set of evidence that you can use social media to predict disease,” said Sean Young, head of the University of California Institute for Prediction Technology. “The main benefits are to researchers,” so far, he said. This technique is not going to be deployed as a substitute for your annual physical or a basis for public health policy.
That reality is reassuring for our privacy, if disappointing for those wanting Facebook to deliver deep new insights into why people get sick. Even with some advanced big data techniques that scanned 20 million words, researchers didn’t do much better than they would have using demographic data and guesswork. Also, remember: The less you share, the less they can figure out.
Faye Flam is a Bloomberg Opinion columnist. She has written for The Economist, The New York Times, The Washington Post, Psychology Today, Science and other publications. She has a degree in geophysics from the California Institute of Technology.