NEW YORK – Computers are getting pretty good at predicting the future. In many cases they do it better than people. That’s why Amazon uses them to figure out what you’re likely to buy, how Netflix knows what you might want to watch, the way meteorologists come up with accurate 10-day forecasts.
Now a team of scientists has demonstrated that a computer can outperform human judges in predicting who will commit a violent crime. In a paper published last month, they described how they built a system that started with people already arrested for domestic violence, then figured out which of them would be most likely to commit the same crime again.
The technology could potentially spare victims from being injured, or even killed. It could also keep the least dangerous offenders from going to jail unnecessarily. And yet, there’s something unnerving about using machines to decide what should happen to people. If targeted advertising misfires, nobody’s liberty is at stake.
For two decades, police departments have used computers to identify times and places where crimes are more likely to occur, guiding the deployment of officers and detectives. Now they’re going another step: using vast data sets to identify individuals who are criminally inclined. They’re doing this with varying levels of transparency and scientific testing. A system called Beware, for example, is capable of rating citizens of Fresno, California, as posing a high, medium or low level of threat. Press accounts say the system amasses data not only on past crimes but on Web searches, property records and social networking posts.
Critics are warning that the new technology had been rushed into use without enough public discussion. One question is precisely how the software works — it’s the manufacturer’s trade secret. Another is whether there’s scientific evidence that such technology works as advertised. By contrast, the recent paper on the system that forecasts domestic violence lays out what it can do and how well it can do it.
One of the creators of that system, University of Pennsylvania statistician Richard Berk, said he only works with publicly available data on people who have already been arrested. The system isn’t scooping up and crunching data on ordinary citizens, he said, but is making the same forecasts that judges or police officers previously had to make when it came time to decide whether to detain or release a suspect.
He started working on crime forecasting more than a decade ago, and by 2008 had created a computerized system that beat the experts in picking which parolees were most likely to re-offend. He used a machine learning system — feeding a computer lots of different kinds of data until it discovered patterns that it could use to make predictions, which then can be tested against known data.
In the domestic violence paper, published in February in the Journal of Empirical Legal Studies, Berk and Penn psychologist Susan Sorenson looked at data from about 100,000 cases, all occurring between 2009 and 2013. Here, too, they used a machine learning system, feeding a computer data on age, sex, zip code, age at first arrest and a long list of possible previous charges for such things as drunk driving, animal mistreatment and firearms crimes. They did not use race, though Berk said the system isn’t completely race blind because some inferences about race can be drawn from a person’s zip code.
The researchers used about two-thirds of the data to “train” the system, giving the machine access to the input data as well as the outcome — whether or not these people were arrested a second time for domestic violence. The other third of the data they used to test the system, giving the computer only the information that a judge could know at arraignment, and seeing how well the system predicted who would be arrested for domestic violence again.
It would be easy to reduce the number of repeat offenses to zero by simply locking up everyone accused of domestic violence, but there’s a cost to jailing people who aren’t going to be dangerous, said Berk. Currently, about half of those arrested for domestic violence are released, he said. The challenge he and Sorenson faced was to continue to release half but pick a less dangerous half. The result: About 20 percent of those released by judges were later arrested for the same crime. Of the computer’s choices, it was only 10 percent.
Berk and Sorensen are currently working with the Philadelphia police, he said, to adapt the machine learning system to predict which households are most at risk of domestic violence. Those, he said, can be targeted with extra supervision.
The parole system has already been implemented in Philadelphia. Parolees in the city are assigned to high-, medium- and low-risk groups by a machine learning system, allowing parole officers to focus most of their attention on the high-risk cases.
One downside might be a more one-dimensional decision-making process. Several years ago, when I wrote an article on the parole system for the Philadelphia Inquirer, I learned that some parole officers found the system constraining. They said that they could have a bigger impact by spending more time with low-risk offenders who were open to accepting help in getting their lives together.
Their concern was that their bosses would put too much faith in the system and too little in them. This echoes the problem Berk says worries him: That people will put too much trust in the technology. If a system hasn’t been through scientific testing, then skepticism is in order. And even those that have been shown to beat human judgment are far from perfect. Machine learning could give crime fighters information, but at this stage it would be a mistake for them to let it make the decisions for them.
Faye Flam writes about science, mathematics and medicine.