CCC Blog

Catalyzing the computing research community and enabling the pursuit of innovative, high-impact research.

Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.

Keys to Biomedical Innovation: “Data Mining & Information Sharing”

October 28th, 2011 / in policy, research horizons / by Erwin Gianchandani

Earlier this month at an event in Washington, DC, Food and Drug Administration (FDA) Commissioner Margaret Hamburg, Ph.D., released a blueprint — titled “Driving Biomedical Innovation: Initiatives for Improving Products for Patients” — for spurring biomedical innovation and improving human health. Stemming from “a review of FDA’s current policies and practices, as well as months of meetings with major stakeholders,” the report “addresses concerns about the sustainability of the medical product development pipeline, which is slowing down despite record investments in research and development.” And among the major actions the blueprint focuses on implementing is the idea of harnessing the potential of data mining and machine learning while protecting patient privacy.

As noted in PCAST’s Report to the President on Health Information Technology, [information technology] has the potential to transform healthcare and — through innovative capabilities — improve safety and efficiency in the development of new tools for medicine, support new clinical studies for particular interventions that work for different patients, and transform the sharing of health and research data…

In particular:

FDA currently houses the largest known repository of clinical data (all of which is de-identified to protect patients’ privacy), including all the safety, efficacy, and performance information that has been submitted to the Agency for new products, as well as an increasing volume of postmarket safety surveillance data. The ability to integrate and analyze these data could revolutionize the development of new patient treatments and allow us to address fundamental scientific questions about how different types of patients respond to therapy. It would also provide an enhanced knowledge of disease parameters — such as meaningful measures of disease progression and biomarkers of safety and drug responses that can only be gained by analyses of large, pooled data sets — and would allow a determination of ineffective products earlier in the development process.

Additionally, the ability to share information in a public forum about why products fail, without compromising proprietary information, presents the potential to save companies millions of dollars by preventing duplication of failure. FDA sometimes sees applications from multiple companies for the same or similar products. Although we may have reason to believe that such a product is likely to fail or that trial design endpoints will not provide necessary information based on a previous application from another company, we are currently unable to share this information. As a result, companies may pour resources into the development of products that FDA knows could be dead ends.

To harness the potential of information sharing and data mining, FDA is rebuilding its IT and data analytic capabilities and establishing science enclaves that will allow for the analysis of large, complex datasets while maintaining proprietary data protections and protecting patients’ information.

The report goes on to describe new “scientific computing” approaches the FDA will pursue…

Historically, the vast majority of FDA de-identified clinical trial data has gone un-mined because of the inability to combine data from disparate sources and the lack of computing power and tools to perform such complex analyses. However the advent of new technologies, such as the ability to convert data from flat files or other formats like paper into data that can be placed in flexible relational database models, dramatic increases in supercomputing power, and the development of new mathematical tools and approaches for analyzing large integrated data sets, has radically changed this situation. Furthermore, innovations in computational methods, including many available as open-source, have created an explosion of statistical and mathematical models that can be exploited to mine data in numerous ways to enable scientists to analyze large complex biological and clinical data sets.

The FDA scientific computing model provides an environment where communities of scientists, known as enclaves, can come together to analyze large, integrated data sets and address important questions confronting clinical medicine. These communities will be project-based and driven by a specific set of questions that will be asked of a dataset. Each enclave is defined by its participants, datasets, and sets of interrogations to be performed on the data. Enclaves may be comprised of internal FDA scientists and reviewers working together or outside collaborators working with FDA scientists under an appropriate set of security controls to protect the sensitive and proprietary data of patients and sponsors, respectively. Engagement of industry sponsors as part of community building will be vigorously pursued, leveraging expertise from the companies that submitted the data in a public-private partnership model…

The ability to integrate large data sets across multiple clinical trials, post-market surveillance data, and pre-clinical data will enable FDA to generate new insights into a variety of important issues confronting medical product development and use. Examples of such insights include the identification of patient subsets who do or do not respond to a specific therapy during a clinical trial, which has the potential to drive personalized medicine; identification of patient subsets with differential safety profiles, efficacy, or side effects related to age or gender; evaluations of standard of care; analyses of disease progression; assessment of current endpoints based on aggregated data; and potential to generate better endpoints and insight into placebo effects. This work, which will address broader scientific issues, is intended to impact whole product classes and therapeutic areas and will be central to driving innovations in medical product development and basic research.

…and provides at least one compelling example of the power of computation:

FDA Innovates: Virtual Patient

Medical device design is highly iterative, and the ability to test novel designs within computer models constructed from digital images of diseased and normal human anatomy could greatly reduce the cost, time, and risk to patients normally involved in producing a new medical device. FDA is in the process of developing a Virtual Physiological Patient — a collection of functional computer models including both normal human anatomy and diseased tissues. These models, which are being developed in partnership with stakeholders, will be made publicly available for medical device companies. Once fully developed, the Virtual Physiological Patient may allow personalization of medical devices so a device can be redesigned to suit an individual patient’s anatomy, physiology, and disease state.

Besides the potential of data mining and machine learning, the FDA report describes six other major actions: