Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.


“Emerging Challenges of Data-Intensive Scientific Computing”

November 19th, 2011 / in big science, CCC, research horizons, resources / by Erwin Gianchandani

Computing in Science and Engineering produces a special issue on Big Data for November/December 2011 [image courtesy IEEE Computer Society/Computing in Science and Engineering].Computing in Science and Engineering is out with a special issue for November/December 2011 focused on Big Data — and the significant research opportunities emerging from a growing wealth of scientific data. As guest editors Francis Alexander (Los Alamos National Laboratory), Adolfy Hoisie (Pacific Northwest National Laboratory), and Alexander Szalay (Johns Hopkins University) write in their introduction:

With the exponential growth in data acquisition and generation — whether by next-generation telescopes, high-throughput experiments, petascale scientific computing, or high-resolution sensors — it’s an extremely exciting time for scientific discovery. As a result of these technological advances, the next decade will see even more significant impacts in fields such as medicine, astronomy and cosmology, materials science, social sciences, and climate. Discoveries will likely be made possible with amounts of data previously unavailable…

But as Alexander, Hoisie, and Szalay note, to do that we must address several challenges:

It’s important that we embrace and promote a balanced approach to addressing the challenges of data-intensive scientific computing. In addition to the hardware investments required, there’s a pressing need to invest in research and development of analysis algorithms. For example, data collection methods often fail to conform to the hypotheses in statistical analyses, and the data might not be independent and identically distributed. This is almost always true for data from experiments and observations of physical systems. There’s also a serious need to develop deterministic, scalable algorithms for analysis, as well as randomized algorithms that are robust to almost certain hardware failures. In some cases, there’s sufficient data, but scientists are faced with a semantic gap. Such a gap occurs, for example, in analyzing video streams.

 

There’s also the risk that for some problems, there still might not be enough data to reach any defensible conclusions. So, as a research community, we shouldn’t abandon work on developing analysis tools for extracting knowledge from limited data…

Key articles in the Computing in Science and Engineering special issue (after the jump…):

In “Data-Intensive Science in the Department of Energy: Case Studies and Future Challenges,” James P. Ahrens and his coauthors use a case study approach to tease out the common challenges that must be met for projects to succeed. These challenges include network and analysis infrastructure, data from large-scale climate and cosmology simulations, data from x-ray observatories, and neutron scattering data from DOE user facilities such as Argonne National Laboratory’s Advanced Photon Source and Oak Ridge National Laboratory’s Spallation Neutron Source. Ahrens and his coauthors discuss workflow models and data, software, and architectures required for success.

 

In his article, “Data-Intensive Scalable Computing for Scientific Applications,” [CCC Council member] Randal E. Bryant explores the scalability requirements for data-intensive scientific computing, both for managing the data and carrying out large-scale numerical calculations with massive datasets. Bryant argues that future data-intensive scientific computing systems will differ considerably from the more traditional HPC systems used by the scientific community today, such as compute intensive, diskless clusters. He also discusses how a new class of highly adaptive and scalable architectures developed and used by Internet-based service companies could come to the rescue.

 

Finally, in “Extreme Data-Intensive Scientific Computing,” Alexander S. Szalay focuses on the challenges faced by — but by no means unique to — the astronomy community where new telescopes coming on line will be generating petabytes of data per day. Szalay discusses the notion of balance characterized via Amdahl’s law. He also offers a pathway forward using commodity hardware, given both the practical budgetary constraints that most universities and research centers face these days and the emerging power wall, which is soon to become the limiting factor in deploying high-performance computing.

Check out the entire special issue here.

And while we’re on the subject, be sure to review a series of white papers articulating the Big Data challenges in areas of national priority (including healthcare, energy, transportation, national defense, and so on) that the CCC produced last fall.

(Contributed by Erwin Gianchandani, CCC Director)

“Emerging Challenges of Data-Intensive Scientific Computing”

Comments are closed.