In medicine, the cautionary tales about the unintended effects of artificial intelligence are already legendary.
There was the program meant to predict when patients would develop sepsis, a deadly bloodstream infection, that triggered a litany of false alarms. Another, intended to improve follow-up care for the sickest patients, appeared to deepen troubling health disparities.
Wary of such flaws, physicians have kept AI working on the sidelines: assisting as a scribe, as a casual second opinion and as a back-office organizer. But the field has gained investment and momentum for uses in medicine and beyond.
Within the Food and Drug Administration, which plays a key role in approving new medical products, AI is a hot topic. It is helping to discover new drugs. It could pinpoint unexpected side effects. And it is even being discussed as an aid to staff who are overwhelmed with repetitive, rote tasks.
Yet, in one crucial way, the FDA’s role has been subject to sharp criticism: how carefully it vets and describes the programs it approves to help doctors detect everything from tumors to blood clots to collapsed lungs.
"We’re going to have a lot of choices. It’s exciting,” Dr. Jesse Ehrenfeld, president of the American Medical Association, a leading doctors lobbying group, said in an interview. "But if physicians are going to incorporate these things into their workflow, if they’re going to pay for them and if they’re going to use them — we’re going to have to have some confidence that these tools work.”
U.S. President Joe Biden planned to issue an executive order Monday that calls for regulations across a broad spectrum of agencies to try to manage the security and privacy risks of AI, including in health care. The order calls for more funding for AI research in medicine and also for a safety program to gather reports on harmful or unsafe practices. There is a meeting of world leaders to discuss the topic later this week.
No single U.S. agency governs the entire landscape. Democrat Sen. Chuck Schumer of N.Y., the majority leader, summoned tech executives to Capitol Hill in September to discuss ways to nurture the field and also identify pitfalls.
Google has already drawn attention from Congress with its pilot of a new chatbot for health workers. Called Med-PaLM 2, it is designed to answer medical questions but has raised concerns about patient privacy and informed consent.
How the FDA will oversee such "large language models” — programs that mimic expert advisers — is just one area where the agency lags behind rapidly evolving advances in the AI field. Agency officials have only begun to talk about reviewing technology that would continue to "learn” as it processes thousands of diagnostic scans. And the agency’s existing rules encourage developers to focus on one problem at a time — such as a heart murmur or a brain aneurysm — a contrast to AI tools used in Europe that scan for a range of problems.
The agency’s reach is limited to products being approved for sale. It has no authority over programs that health systems build and use internally. Large health systems such as Stanford, Mayo Clinic and Duke — as well as health insurers — can build their own AI tools that affect care and coverage decisions for thousands of patients with little to no direct government oversight.
Still, doctors are raising more questions as they attempt to deploy the roughly 350 software tools that the FDA has cleared to help detect clots, tumors or a hole in the lung. They have found few answers to basic questions: How was the program built? How many people was it tested on? Is it likely to identify something a typical doctor would miss?
The lack of publicly available information, perhaps paradoxical in a realm replete with data, is causing doctors to hang back, wary that technology that sounds exciting can lead patients down a path to more biopsies, higher medical bills and toxic drugs without significantly improving care.
Dr. Eric Topol, author of a book on AI in medicine, is a nearly unflappable optimist about the technology’s potential. But he said the FDA had fumbled by allowing AI developers to keep their "secret sauce” under wraps and failing to require careful studies to assess any meaningful benefits.
"You have to have really compelling, great data to change medical practice and to exude confidence that this is the way to go,” said Topol, executive vice president of Scripps Research in San Diego. Instead, he added, the FDA has allowed "shortcuts.”
Large studies are beginning to tell more of the story: One found the benefits of using AI to detect breast cancer and another highlighted flaws in an app meant to identify skin cancer, Topol said.
Dr. Jeffrey Shuren, chief of the FDA’s medical device division, has acknowledged the need for continuing efforts to ensure that AI programs deliver on their promises after his division clears them. Drugs and some devices are tested on patients before approval, but the same is not typically required of AI software programs.
A new approach could be building labs where developers could access vast amounts of data and build or test AI programs, Shuren said during the National Organization for Rare Disorders conference on Oct. 16.
"If we really want to assure that right balance, we’re going to have to change federal law, because the framework in place for us to use for these technologies is almost 50 years old,” Shuren said. "It really was not designed for AI.”
Other forces complicate efforts to adapt machine learning for major hospitals and health networks. Software systems don’t talk to one another. No one agrees on who should pay for them.
By one estimate, about 30% of radiologists (a field in which AI has made deep inroads) are using AI technology. Simple tools that might sharpen an image are an easy sell. But higher-risk ones, such as those selecting whose brain scans should be given priority, concern doctors if they do not know, for instance, whether the program was trained to catch the maladies of a 19-year-old versus a 90-year-old.
Aware of such flaws, Dr. Nina Kottler is leading a multiyear, multimillion-dollar effort to vet AI programs. She is the chief medical officer for clinical AI at Radiology Partners, a Los Angeles-based practice that reads roughly 50 million scans annually for about 3,200 hospitals, free-standing emergency rooms and imaging centers in the United States.
She knew that diving into AI would be delicate with the practice’s 3,600 radiologists. After all, Geoffrey Hinton, known as the "godfather of AI,” roiled the profession in 2016 when he predicted that machine learning would replace radiologists altogether.
Kottler said she began evaluating approved AI programs by quizzing their developers and then tested some to see which programs missed relatively obvious problems or pinpointed subtle ones.
She rejected one approved program that did not detect lung abnormalities beyond the cases her radiologists found — and missed some obvious ones.
Another program that scanned images of the head for aneurysms, a potentially life-threatening condition, proved impressive, she said. Although it flagged many false positives, it detected about 24% more cases than radiologists had identified. More people with an apparent brain aneurysm received follow-up care, including a 47-year-old with a bulging vessel in an unexpected corner of the brain.
At the end of a telehealth appointment in August, Dr. Roy Fagan realized he was having trouble speaking to the patient. Suspecting a stroke, he hurried to a hospital in rural North Carolina for a CT scan.
The image went to Greensboro Radiology, a Radiology Partners practice, where it set off an alert in a stroke-triage AI program. A radiologist didn’t have to sift through cases before Fagan’s or click through more than 1,000 image slices; the one spotting the brain clot popped up immediately.
The radiologist had Fagan transferred to a larger hospital that could rapidly remove the clot. He woke up feeling normal.
"It doesn’t always work this well,” said Dr. Sriyesh Krishnan, of Greensboro Radiology, who is also director of innovation development at Radiology Partners. "But when it works this well, it’s life-changing for these patients.”
Fagan wanted to return to work the next Monday but agreed to rest for a week. Impressed with the AI program, he said, "It’s a real advancement to have it here now.”
Radiology Partners has not published its findings in medical journals. Some researchers who have, though, highlighted less inspiring instances of the effects of AI in medicine.
University of Michigan researchers examined a widely used AI tool in an electronic health record system meant to predict which patients would develop sepsis. They found that the program fired off alerts on 1 in 5 patients — although only 12% went on to develop sepsis.
Another program that analyzed health costs as a proxy to predict medical needs ended up depriving treatment to Black patients who were just as sick as white ones. The cost data turned out to be a bad stand-in for illness, a study in the journal Science found, since less money is typically spent on Black patients.
Those programs were not vetted by the FDA. But given the uncertainties, doctors have turned to agency approval records for reassurance. They found little. One research team looking at AI programs for critically ill patients found evidence of real-world use "completely absent” or based on computer models. The University of Pennsylvania and University of Southern California team also discovered that some of the programs were approved based on their similarities to existing medical devices — including some that did not even use AI.
Another study of FDA-cleared programs through 2021 found that of 118 AI tools, only one described the geographic and racial breakdown of the patients the program was trained on. The majority of the programs were tested on 500 or fewer cases — not enough, the study concluded, to justify deploying them widely.
Dr. Keith Dreyer, a study author and chief data science officer at Massachusetts General Hospital, is now leading a project through the American College of Radiology to fill in the gap of information. With the help of AI vendors who have been willing to share information, he and colleagues plan to publish an update on the agency-cleared programs.
That way, for instance, doctors can look up how many pediatric cases a program was built to recognize to inform them of blind spots that could potentially affect care.
James McKinney, an FDA spokesperson, said the agency’s staff members review thousands of pages before clearing AI programs, but he acknowledged that software makers may write the publicly released summaries. Those are not "intended for the purpose of making purchasing decisions,” he said, adding that more detailed information is provided on product labels, which are not readily accessible to the public.
Getting AI oversight right in medicine, a task that involves several agencies, is critical, said Ehrenfeld, the AMA president. He said doctors have scrutinized the role of AI in deadly plane crashes to warn about the perils of automated safety systems overriding a pilot’s — or a doctor’s — judgment.
He said the 737 Max plane crash inquiries had shown how pilots weren’t trained to override a safety system that contributed to the deadly collisions. He is concerned that doctors might encounter a similar use of AI running in the background of patient care that could prove harmful.
"Just understanding that the AI is there should be an obvious place to start,” Ehrenfeld said. "But it’s not clear that that will always happen if we don’t have the right regulatory framework.
This article originally appeared in The New York Times © 2025 The New York Times Company
With your current subscription plan you can comment on stories. However, before writing your first comment, please create a display name in the Profile section of your subscriber account page.