NPR’s Diane Rehm Show on Monday featured an hour-long discussion among several thought leaders — titled “The New World of Massive Data Mining” – about the Federal government’s new Big Data R&D Initiative:
Every time you go on the Internet, make a phone call, send an email, pass a traffic camera or pay a bill, you create [electronic data]. In all, 2.5 quintillion bytes of data are created each day. This massive pile of information from all sources is called “Big Data.” It gets stored somewhere, and everyday the pile gets bigger. Government and industry are finding new ways to analyze it. Last week the administration announced an initiative to aid the development of Big Data computing. A panel of experts join guest host Tom Gjelten to discuss the opportunities — for business, science, medicine, education, and security … but also the privacy concerns.
Among the discussants:
- Suzanne Iacono, co-chair, Federal Big Data Senior Steering Group, and senior science adviser, Directorate for Computer and Information Science and Engineering (CISE) at the National Science Foundation (NSF);
- Daphne Koller, professor of computer science, Stanford University Artificial Intelligence Laboratory;
- John Villasenor, senior fellow at the Brookings Institution and professor of electrical engineering at University of California, Los Angeles; and
- Michael Leiter, senior counselor, Palantir Technologies, and former director, National Counterterrorism Center.
Leiter laid out the challenge:
It’s not just the volume of the data… [but] it’s also the speed with which it’s coming in, and also the variety of forms of that data. It can be text, it can be weblog records, it can be video, it can be pictures — all of that data becomes more and more overwhelming. And the difficulty of course is trying to stay in front of that — trying to make sure you know what you have and how different pieces within different data sets are correlated with one another…
It requires, first of all, integrating that data — it’s not just looking at one stovepipe of information; it’s comparing one source of information with other sources and seeing where there are correlations that are meaningful. Second, it’s being able to do so in a very flexible, agile away, so a human being can manipulate and play with that data… You’re not just relying on a set of algorithms that supposedly spit out an answer; [rather] people can crawl through that data and identify what is meaningful, test hypotheses, and then look in other areas.
Iacono described the interests of the U.S. government in Big Data (following the link):