For the last several years, the U.S. FDA and its stakeholders have been calling for a new regulatory approach for artificial intelligence (AI) used in healthcare. Pretty much everyone seems to agree that AI is different from other medical devices, due in part to the fact that AI learns on the job. The fact that such learning changes performance over time and the fact that the company might make frequent updates both suggest the need for a new regulatory approach.
In light of that general agreement, what progress are we making toward that goal? This article focuses on some of FDA’s initiatives over the last year and specifically examines the agency’s efforts to define the scope of its regulation of AI, the process and substance of FDA premarket reviews, and finally a few special issues associated with autonomous AI.
Â
1. SCOPE OF FDA REGULATION OF MEDICAL AI
Over the last couple years, both Congress and FDA have been working to clarify what software is regulated and what is not. The most recent phase of those efforts began in December 2016 when Congress passed the 21st Century Cures Act (Cures Act), section 3060(a), which modified the definition of a medical device in the Food, Drug & Cosmetic Act. The 2016 amendments laid out several categories of software that Congress excluded from FDA regulation. A year later, in 2017, FDA started to implement that legislation by publishing a few draft guidances, including one on Clinical and Patient Decision Support Software. Much has been written on that topic over the last several years, so this article will only focus on developments in implementing this section since last fall.
The easiest way to discuss this is to break it into three different categories:
a. Software that Functions as an Accessory to a Medical Device
Software remains regulated under the Cures Act if it is “intended to acquire, process, or analyze a medical image or a signal from an in vitro diagnostic device or a pattern or signal from a signal acquisition system.†That language is difficult to understand, but at a high level Congress was trying to reserve within FDA’s regulatory scope software that, for example, analyzed radiological images or read EKG signals. In September 2019, FDA attempted to clarify that language in a new Draft Guidance on Clinical Decision Support Software.1 Among other things, in the guidance FDA suggests that the agency focuses on software that analyzes “physiological signals†for medical purposes such as diagnosis or therapeutic decision making.
b. Software as a Medical Device (SaMD)
FDA potentially regulates clinical decision support (CDS) software that analyzes medical information to support or provide a recommendation to a healthcare professional about prevention, diagnosis, or treatment of a disease or condition. I say “potentially†because such software is excluded from FDA regulation if the software enables the professional “to independently review the basis for the recommendations that such software presents so that it is not the intent that the [professional] rely primarily on any of such recommendations to make a clinical diagnosis or treatment decision regarding an individual patient.†2
In its 2017 proposed guidance on CDS, FDA had taken the position that “in order for the software function to be excluded from the definition of device, the intended user should be able to reach the same recommendation on his or her own without relying primarily on the software function.†That essentially excludes software that uses machine learning because such software would not be able to meet that test.
In the September 2019 guidance on CDS, FDA changed its view and took a more flexible approach to determining whether or not the professional user can review and understand the basis for the recommendations. After reciting several categories of information that the guidance wants developers to share with professional users, the guidance sums up the test by explaining: “A practitioner would be unable to independently evaluate the basis of a recommendation, and therefore would be primarily relying upon it, if the recommendation were based on information whose meaning could not be expected to be independently understood by the intended HCP user (e.g., the inputs used to generate the recommendation are not identified).â€
Unfortunately, FDA took a step backward by declaring that transparency only applies to the lowest risk category of CDS software, a position found nowhere in the federal statute. Several organizations pointed that out in comments submitted on the September proposal. We have not seen a final guidance.
But it is interesting that in March 2020, FDA was willing to jettison this limitation in response to the coronavirus pandemic. In its Enforcement Policy for Non-Invasive Remote Monitoring Devices Used to Support Patient Monitoring During the Coronavirus Disease-2019 (COVID-19) Public Health Emergency, at least for the duration of the emergency FDA is willing to remove its proposed limitation to allow the development of CDS software targeting the coronavirus on the basis of compliance with the transparency requirement, even though the virus clearly is deadly.
c. Unregulated Software, including Enforcement Discretion
This topic gets confusing because there is certain software that FDA cannot by statute regulate,  and then there is certain software that FDA says it has the right to regulate but simply chooses not to. That stuff makes my head hurt, so I’m going to lump it all together here.
FDA updated a whole slew of guidance documents defining what’s unregulated in September of 2019, at the same time the agency proposed the new CDS guidance. Old guidance that FDA updated in September includes:
- Changes to Existing Medical Software Policies Resulting from Section 3060 of the 21st Century Cures Act      Â
- Policy for Device Software Functions and Mobile Medical Applications   Â
- Medical Device Data Systems, Medical Image Storage Devices, and Medical Image Communications Devices
- General Wellness: Policy for Low Risk Devices
- Off-The-Shelf Software Use in Medical Devices
The guidances explain both software categories no longer regulated under the Cures Act as well as software categories where FDA is willing to exercise enforcement discretion.
2. PROCESS FOR REVIEWING AI PREMARKET SUBMISSIONS
a. Current Process: De Novo
Presently, there is a dearth of predicate devices for many AI applications outside of radiology. As a consequence, outside of radiology, many of the new AI applications are coming to market via the de novo review process.
Unfortunately, that process is long and largely unpredictable. In practice it is more like a premarket approval application, in the sense that FDA has a freer hand to define what the agency wants to see than it has in the 510(k) process in which the agency is limited to those issues related to substantial equivalence. The de novo process requires an assurance of safety and effectiveness typically accomplished through a clinical trial. On top of that, because it’s a reclassification process, the applicant has to demonstrate regulate-ability, which means that:
- The safety and effectiveness profile has to be proven to be low enough that FDA is comfortable with reclassifying the product into class I or II, and
- If class II, the agency knows how to develop special controls to ensure safety and effectiveness of the product category.Â
While there are exceptions like Apple, which managed to breeze through the de novo process in short order, the vast majority of de novo applications are proceeding through a long and torturous route. While the long-term trend shows the number of de novo submissions is growing, FY 2019 was not very encouraging. See the table below on CDRH’s De Novo Performance Metrics. 3
Performance Metric | FY2018 | FY2019 |
De Novos Accepted | 56 | 61 |
Number with MDUFA IV Decisions | 55 | 25 |
Number with Granted Decisions | 25 | 5 |
Number with Declined Decisions | 15 | 10 |
Number of Withdrawals | 10 | 10 |
Number Deleted | 5 | 0 |
Rate of Granted Decisions | 45.45% | 20.00% |
Rate of Declined Decisions | 27.27% | 40.00% |
Rate of Withdrawals | 18.18% | 40.00% |
Rate of Deleted | 9.09% | 0% |
Â
Â
Â
Â
Â
Â
Â
Â
Table I: CDRH’s De Novo Performance Metrics. This low level of favorable decisions has ominous implications for AI based products.
b. Potential Future Process: Precertification
Three years ago FDA launched an initiative to develop a Precertification Program that would shift the regulatory focus from the product to the developer. FDA reasoned that if it had confidence in health software developers and believed them to have a culture of quality and organizational excellence, FDA could allow the software onto the market with an abbreviated review, so long as the company agreed to submit to much greater FDA oversight during the postmarket phase.
FDA’s last official update on the precertification program came out last summer. Not much has been officially said about the program since then, but recently the leader of the FDA program, Bakul Patel, gave an interview 4 where he indicated that the pilot phase from 2019 would extend through 2020. Further, he indicated that the program is getting complex as they address the details, causing progress to slow. He indicated that the agency now recognizes that it will need to obtain statutory authority for the program, involving congressional stakeholders such as Senator Elizabeth Warren who has concerns with whether the new approach will adequately protect patients.
3. SUBSTANTIVE REQUIREMENTS AI WILL HAVE TO MEET PRIOR TO MARKETING
a. Preclinical and Clinical Testing
For those who want to understand what FDA would require in the form of preclinical and clinical testing for new uses of AI outside of radiology, FDA consistently recommends that developers follow the two primary radiology guidances, including one updated in January 2020:
- Clinical Performance Assessment: Considerations for Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data – Premarket Approval (PMA) and Premarket Notification [510(k)] Submissions, January 2020
- Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data – Premarket Notification [510(k)] Submissions, July 2012
The 510(k) guidance is less useful simply because there are fewer 510(k) opportunities available outside radiology due to the lack of predicate devices. But it nonetheless has good guidance on how to address the intended use issues associated with machine learning.
b. Good Machine Learning Practices
In April 2019, FDA issued a concept paper 5 at the end of Commissioner Scott Gottlieb’s term in office. In the paper, FDA suggests that AI-based software is different enough from other medical devices that it requires its own quality system requirements. FDA suggests that key areas of quality standards unique to AI should include:
- Relevance of available data to the clinical problem and current clinical practice.
- Data acquired in a consistent, clinically relevant and generalizable manner that aligns with the SaMD’s intended use and modification plans.
- Appropriate separation between training, tuning, and test datasets.
- Appropriate level of transparency (clarity) of the output and the algorithm aimed at users.
There has not been a lot of publicly released activity since then, perhaps because FDA has had some challenges recruiting people well-versed in data science.
But there are some private groups working to flesh out some of these ideas. One, for example, is the Good Machine Learning Practices Team that is working as a part of the Xavier University AI Initiative. 6 Under the university’s oversight, a group of about 30 experts across many sectors and domains is working to identify Good Input Data Quality Practices that the agency could then review as a starting point in developing its own Good Machine Learning Practices.
c. Special Approval Requirements for Adaptive AI
In 2019, FDA began to focus on adaptive AI. Prior to that, virtually all AI software needed to be locked to secure FDA clearance or approval. But companies were pushing for FDA to consider permitting the marketing of AI that could evolve with use.
Part of the April 2019 FDA concept paper was dedicated to the idea that software developers could, as a part of securing marketing permission, propose to the agency parameters within which they could make changes to update their algorithm or allow it to adapt itself, and a test protocol that they would follow in validating those changes. As I said, not much official has happened on that concept paper, but that’s not stopping companies from proposing these approaches individually in premarket submissions.
For example, the documentation around FDA’s de novo decision concerning IDx-DR, 7 an artificially intelligent algorithm that analyzes eye images taken with a retinal camera to assess the possibility of mild diabetic retinopathy, reveals some of the same regulatory strategies. The submission is groundbreaking in a number of ways, including the fact that FDA’s ultimate decision allows the software to operate autonomously, a first of its kind. The software operates autonomously in the sense that the image analyzed by the software does not itself get reread by a human. There is, to be clear, still confirmatory testing required before treatment is pursued.
4. ISSUES CREATED BY ALLOWING AI TO OPERATE AUTONOMOUSLY
FDA premarket authorization of AI that operates autonomously is the brave new world. While the IDx was the first, the second occurred in early 2020. Through a de novo, FDA gave marketing authorization to a product called Caption Guidance, a system that would guide an untrained user of an ultrasound system. 8 The idea is that it is useful to have ultrasound in the hands of folks like army medics as well as nurses with no special training in sonography.
FDA understands this is the future and as a result had a public workshop on the Evolving Role of Artificial Intelligence in Radiological Imaging on February 25 – 26, 2020. While throughout this summary I am discussing radiological imaging, it’s only because that’s the place where AI is being deployed first in many ways. But what FDA decides with regard to radiology will set the precedent for all other technologies, unless they can be adequately distinguished.
For those interested, the meeting was recorded and it may well be a productive use of your time to listen especially to the FDA presentations that summarize the regulatory requirements for AI. The first day focused on autonomous AI, and the second focused on AI used to guide medical device application.
There were two important discussion points.
a. Big Data
First, FDA expects big data sets for training AI that will be used autonomously, because the agency wants to make sure that most of the fringe use cases are considered. This is problematic for small companies, especially startups, because larger datasets may be more difficult or expensive for such companies to obtain. It’s also unnecessary in certain cases where smaller data sets may actually work better for several reasons. For example, requiring large data sets would:
- Cause people to use more questionable data. Bigger doesn’t mean better. There are plenty of large datasets that include, in the vernacular, garbage. Simply using a large dataset where the quality of the data is not as strong means producing a worse, not a better, algorithm.
- Inhibit the ability of software developers to tackle new, emerging threats like the novel coronavirus.
- Discourage the use of new, innovative approaches to machine learning. For example, an algorithm that is based on a blend of rules and data often does better with smaller, more focused data sets. The same is true for an algorithm based on so-called “knowledge-based reasoning.â€
b. Postmarket Reporting
A second issue that arose is a strong bias at FDA toward expanding the agency’s legal authority to require postmarket data collection and enhanced postmarket reporting to FDA. Its argument is that if FDA is going to be less certain of safety and effectiveness at the time of marketing authorization, it ought to receive more information postmarket. At this point, FDA has been very vague about exactly what it wants to do in this arena.
Practically speaking, there’s really only two alternatives, or a blend of the two, if FDA is going to seek legal authority to require more information. The new requirement would need to force companies to supply either:
- Raw data collected from users, and/or
- More reports from software developers analyzing raw data collected from users.
To sell this idea, FDA is suggesting that these requirements will only be voluntary as a way for companies to get earlier approvals. But given the competitive nature of the marketplace, that’s not truly voluntary. Further, Congress, on behalf of patients as well as companies, will have something to say about any new system and whether it does an adequate job of achieving the purposes of the statute.
If FDA opts for requiring raw data sent to the agency postmarket, there are several problems with that including, for example:
- The agency will never know as much about how the product is being used as the company knows, and lacking that context, it’s quite likely that the agency would overreact frequently. It may also underreact, because any signal will be hidden in the noise.
- Just practically, the agency will have no way to devote the necessary resources—human or computer—to reviewing raw data streams on every single AI product out there, especially once AI becomes more prevalent. This is true even if FDA tries to leverage the resources of the National Evaluation System for health Technology (NEST). That group is stretched across collecting real-world evidence for all medical devices.
- This approach also creates a substantial risk that confidential commercial information including details about an algorithm would be released to the public and to competitors.
If the plan is to increase the number of reports that software developers must submit analyzing the raw data, then this sort of postmarket burden becomes an impossible one for most companies, and especially for startups that have been the source of so many innovations in AI.
The bottom line is that expanded postmarket reporting raises considerable challenges because it puts the agency in a position to micromanage firms on the basis of incomplete information. In contrast, the current Medical Device Reporting system is designed to strike a balance between burden and benefit. It ensures that FDA gets the benefit of the company being vigilant and separating the signal from the noise. But it avoids the burdens of sucking away resources from other parts of the quality system to the detriment of patients.
To be clear, there’s absolutely no doubt that companies will need to engage in enhanced vigilance with an autonomous, AI-based product. And indeed, the records of that vigilance will be held within each company’s quality system and available to FDA for inspection. But requiring FDA to make an on-site visit to inspect such data is the only safeguard we have against an overly-intrusive regulatory body.
CONCLUSION
On the whole, FDA seems very enthusiastic about the possibilities for AI to improve healthcare. That’s gratifying to see. While progress in the development of a new regulatory approach is slow, that’s also not all bad. It would be worse in many ways for FDA to act precipitously. These are complicated issues, and it will take time to develop an appropriate approach, in part because it will take time for FDA to acquire enough expertise. And most likely, whatever new approach we come up with will require new statutory authority. That said, it is incumbent on all of us to work as quickly as we can to bring these exciting new developments to waiting patients. The possible improvements to patient care are enormous.
References
- Clinical Decision Support Software Draft Guidance for Industry and FDA Staff, September 2019, https://www.fda.gov/regulatory-information/search-fda-guidance-documents/clinical-decision-support-software.
- Section 201(h) of the Food, Drug & Cosmetic Act (21 U.S.C. 321(h))
- Agenda for Quarterly Meeting on MDUFA IV (FY 2018-2022) Performance November 15, 2019, page 277 https://www.fda.gov/media/132770/download.Â
- “FDA still trying to fine-tune Pre-Cert as pilot enters 2020,” Medtech Dive, March 2020, https://www.medtechdive.com/news/fda-pre-cert-software-device-pilot-enters-another-year/574822/
- Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) – Discussion Paper and Request for Feedback, FDA, https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-…
- Good Machine Learning Practices Team, Xavier Health, https://www.xavierhealth.org/gmlp-team. [Disclosure: Author serves on the team.]
- De Novo Classification Request for IDX-DR, https://www.accessdata.fda.gov/cdrh_docs/reviews/DEN180001.pdf.
- De Novo Classification Request for Caption Guidance, https://www.accessdata.fda.gov/cdrh_docs/reviews/DEN190040.pdf.