Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.

Obama Administration Unveils $200M Big Data R&D Initiative

March 29th, 2012 / in big science, policy, research horizons, Research News, resources / by Erwin Gianchandani

(This post has been updated; please scroll down for the latest.)

Throughout the 2008 hurricane season, the Texas Advanced Computing Center was an active participant in a NOAA research effort to develop next-generation hurricane models. Teams of scientists relied on TACC's Ranger supercomputer to test high-resolution ensemble hurricane models, and to track evacuation routes from data streams on the ground and from space. Using up to 40,000 processing cores at once, researchers simulated both global and regional weather models and received on-demand access to some of the most powerful hardware in the world enabling real-time, high-resolution ensemble simulations of the storm. This visualization of Hurricane Ike shows the storm developing in the gulf and making landfall on the Texas coast [image courtesy Gregory P. Johnson, Romy Schneider, John Cazes, Karl Schulz, Bill Barth, The University of Texas at Austin; Frank Marks, NOAA; Fuqing Zheng, University of Pennsylvania; Yonghui Weng, Texas A&M; via NSF].

The Obama Administration this morning unveiled details about its Big Data R&D Initiative, committing more than $200 million in new funding through six agencies and departments to improve “our ability to extract knowledge and insights from large and complex collections of digital data.” The effort, spearheaded by the White House Office of Science and Technology Policy (OSTP) and National Science Foundation (NSF), along with the National Institutes of Health (NIH)Department of Defense (DoD)Defense Advanced Research Projects Agency (DARPA)Department of Energy (DoE) Office of Science, and U.S. Geological Survey (USGS), seeks to “advance state-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data; harness these technologies to accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning; and expand the workforce needed to develop and use Big Data technologies.”

The first wave of commitments to support the Big Data Initiative features a new joint solicitation of up to $25 million supported by NSF and NIH — Core Techniques and Technologies for Advancing Big Data Science and Engineering (BIGDATA) — that will advance foundational research in Big DataThe solicitation aims to (after the jump):

extract and use knowledge from collections of large data sets in order to accelerate progress in science and engineering research. Specifically, it will develop and evaluate new algorithms, statistical methods, technologies, and tools for improved data collection and management, data analytics, and e-science collaboration environments.

Farnam Jahanian, Assistant Director for NSF’s Directorate for Computer and Information Science and Engineering (CISE), noted:

“The Big Data solicitation creates enormous opportunities for extracting knowledge from large-scale data across all disciplines. Foundational research advances in data management, analysis, and collaboration will change paradigms of research and education, and promise new approaches to addressing national priorities.

For the solicitation, NIH is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other data sets related to human health and disease.

In addition to the BIGDATA solicitation, NSF is also issuing several new awards today in support of the initiative:

  • A $10 million Expeditions in Computing award to a team of University of California, Berkeley, researchers, to integrate “algorithms, machines, and people” (cloud computing, machine learning, and crowdsourcing) to generate new knowledge and insights from big data.
  • The first round of awards made under the Foundation’s Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21). Through a program called EarthCube, these awards will “support community-guided cyberinfrastructure to integrate big data across geosciences,” ultimately transforming how geoscientists access, analyze, and share information about our planet.
  • A $2 million award for a research training group in big data that will support training for undergraduate and graduate students and postdoctoral fellows using novel statistical, graphical, and visualization techniques to study complex data.
  • And a $1.4 million award for a focused research group that brings together statisticians and biologists to develop network models and automatic, scalable algorithms and tools to determine protein structures and biological pathways.

Meanwhile, DoD is “placing a big bet on big data,” launching “Data to Decisions” — an investment of $250 million annually, with $60 million available for new research projects, in a series of programs that will

  • harness and utilize massive data in new ways, and bring together sensing, perception, and decision support to make truly autonomous systems that can maneuver and make decisions on their own; and
  • improve situational awareness to help warfighters and analysts and provide increased support to operations.


[DoD] is seeking a 100-fold increase in the ability of analysts to extract information from texts in any language, and a similar increase in the number of objects, activities, and events that an analyst can observe.

To accelerate innovation in Big Data, DoD will initiate a series of open prize competitions in this space in the coming months.

And DARPA is rolling out the XDATA program, which will provide $25 million annually to

develop computational techniques and software tools for analyzing large volumes of data, both semi-structured (tabular, relational, categorical, and meta-data) and unstructured (tet documents, message traffic). Central challenges to be addressed include


  1. developing scalable algorithms for processing imperfect data in distributed data stores; and
  2. creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions.


The XDATA program will support open source software toolkits to enable flexible software development for users to process large volumes of data in timelines commensurate with mission workflows of targeted defense applications.

Among the other agencies investing in the Big Data R&D Initiative:

To learn more about today’s launch, read the OSTP press release and fact sheet issued this morning, a recent blog post by OSTP Deputy Director for Policy Tom Kalil, and the BIGDATA solicitation.

Be sure to tune in for a live webcast later this afternoon, beginning at 2:00pm EDT, when OSTP and the participating agencies will describe the Big Data R&D Initiative. Among the speakers:

In addition, there will be a panel of thought leaders from academia and industry, to include Alex Szalay (Johns Hopkins University), Lucila Ohno-Machado (University of California, San Diego), Daphne Koller (Stanford University), and James Manyika (McKinsey Global Institute). The panel will be moderated by New York Times’ technology writer Steve Lohr, who authored a piece in today’s paper about the initiative.

And stay tuned — we’ll have yet more details here throughout the day as they become available…


Collaboration and concurrent visualization of 20 simulation runs performed by the International Panel on Climate Change (IPCC) using the HIPerWall (Highly Interactive Parallelized Display Wall) system. Located at the University of California, Irvine, the HIPerWall system is a facility aimed at advancing earth science modeling and visualization by providing unprecedented, high-capacity visualization capabilities for experimental and theoretical researchers. It's being used to analyze IPCC datasets. The room-sized HIPerWall display measures nearly 23 x 9 feet and consists of 50 flat-panel tiles that provide a total resolution of over 200 million mega pixels, bringing to life terabyte-sized datasets [image courtesy Falka Kuester, California Institute for Telecommunications and Information Technology (Calit2), University of California, San Diego, via NSF].Updated Thursday, March 29, 2012, at 10:38am EDT: The BIGDATA solicitation comprises all of NSF’s offices and directorates as well as 7 NIH institutes — the National Cancer Institute (NCI), National Institute of Biomedical Imaging and Bioengineering (NIBIB), National Institute on Drug Abuse (NIDA), National Institute of General Medical Sciences (NIGMS), National Institute of Neurological Disorders and Stroke (NINDS), National Library of Medicine (NLM), and National Human Genome Research Institute (NHGRI).

From the solicitation:

The Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) solicitation aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large, diverse, distributed and heterogeneous data sets so as to: accelerate the progress of scientific discovery and innovation; lead to new fields of inquiry that would not otherwise be possible; encourage the development of new data analytic tools and algorithms; facilitate scalable, accessible, and sustainable data infrastructure; increase understanding of human and social processes and interactions; and promote economic growth and improved health and quality of life. The new knowledge, tools, practices, and infrastructures produced will enable breakthrough discoveries and innovation in science, engineering, medicine, commerce, education, and national security — laying the foundations for US competitiveness for many decades to come.


The phrase “big data” in this solicitation refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.


This solicitation is one component in a long-term strategy to address national big data challenges, which include advances in core techniques and technologies; big data infrastructure projects in various science, biomedical research, health and engineering communities; education and workforce development; and a comprehensive integrative program to support collaborations of multi-disciplinary teams and communities to make advances in the complex grand challenge science, biomedical research, and engineering problems of a computational- and data-intensive world.


Today, U.S. government agencies recognize that the scientific, biomedical and engineering research communities are undergoing a profound transformation with the use of large-scale, diverse, and high-resolution data sets that allow for data-intensive decision-making, including clinical decision making, at a level never before imagined. New statistical and mathematical algorithms, prediction techniques, and modeling methods, as well as multidisciplinary approaches to data collection, data analysis and new technologies for sharing data and information are enabling a paradigm shift in scientific and biomedical investigation. Advances in machine learning, data mining, and visualization are enabling new ways of extracting useful information in a timely fashion from massive data sets, which complement and extend existing methods of hypothesis testing and statistical inference. As a result, a number of agencies are developing big data strategies to align with their missions. This solicitation focuses on common interests in big data research across the National Institutes of Health (NIH) and the National Science Foundation (NSF).


This initiative will build new capabilities to create actionable information that leads to timely and more informed decisions. It will both help to accelerate discovery and innovation, as well as support their transition into practice to benefit society. As the recent President’s Council of Advisors on Science and Technology (PCAST) 2010 review of the Networking Information Technology Research and Development (NITRD) program notes, the pipeline of data to knowledge to action has tremendous potential in transforming all areas of national priority. This initiative will also lay the foundations for complementary big data activities — big data infrastructure projects, workforce development, and progress in addressing complex, multi-disciplinary grand challenge problems in science and engineering.

Through the solicitation, NSF and NIH seek

proposals that develop and evaluate core technologies and tools that take advantage of available collections of large data sets to accelerate progress in science, biomedical research, and engineering. Each proposal should include an evaluation plan.

Proposals can focus on one or more science and engineering perspectives on big data: data collection and management; data analytics; and e-science collaboration environments. In addition, they must also include a description of how the project will build capacity, either through appropriate models, policies, and technologies to support responsible and sustainable big data stewardship; training and communication strategies, targeted to the various research communities and/or the public; or sustainable, cost-effective infrastructure for data storage, access and shared services. Projects may choose to focus on an area of national priority such as health IT, emergency response and preparedness, clean energy, cyberlearning, material genome, national security, and advanced manufacturing — but this is optional.

There are two sizes of projects:

1. Small projects: One or two investigators can ask for up to $250,000 per year for up to three years.

2. Mid-scale projects: Three or more investigators can ask for funding between $250,001 and $1,000,000 per year for up to five years.

Scientists and engineers from all disciplinary areas — including computer science — are encouraged to participate.

The deadline for full proposals is June 13, 2012, for mid-scale projects and July 11, 2012, for small projects.

The solicitation limits each PI to two submissions.


Updated Thursday, March 29, 2012, at 6:35pm EDT: Farnam Jahanian, Assistant Director for NSF’s Directorate for Computer and Information Science and Engineering (CISE), has just issued the following letter to the community describing today’s announcement:

Farnam Jahanian, Assistant Director for NSF/CISEDear Computer and Information Science and Engineering (CISE) Community,


This afternoon at a White House event, the Administration unveiled a Big Data Research and Development Initiative, which creates enormous opportunities for extracting knowledge and insights from large and complex collections of digital data. The CISE community is well poised to become an active participant in this new initiative.


NSF Director, Dr. Subra Suresh, joined other federal science agency leaders to discuss cross-agency plans and announce new research efforts to address big data.  NSF will direct its current efforts to develop new methods to derive knowledge from data; construct new infrastructure to manage, curate and serve data to communities; and forge new approaches for associated education and training.


The cornerstone of the announcements includes a joint NSF-NIH solicitation on foundational research for big data.  The “Core Techniques and Technologies for Advancing Big Data Science & Engineering,” or “Big Data” program aims to advance the core scientific and technological means of managing, analyzing, visualizing and extracting information from large, diverse, distributed, and heterogeneous data sets in order to accelerate progress in science and engineering research. Specifically, it will fund research to develop and evaluate new algorithms, technologies, and tools for improved data management, data analytics, and e-science collaboration environments.


Other announcements included anticipated cross-disciplinary efforts such as an Ideas Lab to explore ways to use big data to enhance teaching and learning effectiveness, and the use of NSF’s Integrative Graduate Education and Research Traineeship, or IGERT, mechanism to educate and train researchers in data enabled science and engineering.


For more information, please see the NSF press release, and the OSTP press release. We look forward to your participation.






Farnam Jahanian

Assistant Director for CISE

National Science Foundation


Updated Friday, March 30, 2012, at 11:30pm EDT: Watch video from Thursday’s Big Data R&D Initiative rollout below:

(Contributed by Erwin Gianchandani, CCC Director)

Obama Administration Unveils $200M Big Data R&D Initiative