Archive for the ‘big science’ category


Big Data in the Classroom

October 23rd, 2014
"Five Reasons 'Big Data' is a Big Deal" [image courtesy Mobiledia].

Data sets are growing rapidly. Yahoo, Google, and Amazon, work with data sets that consist of billions of items. The size and scale of data, which can be overwhelming today, will only increase as the Internet of Things matures. Data sets are also increasingly complex.  It is becoming more important to increase the pool of qualified scientists and engineers who can find the value from the large amount of big data.

The National Academies released a report on training students to extract value from big data based on a Committee on Applied and Theoretical Statistics (CATS) workshop that occurred in April 2014.

From the report:

Training students to be capable in exploiting big data requires experience with statistical analysis, machine learning, and computational infrastructure that permits the real problems associated with massive data to be revealed and, ultimately, addressed. Analysis of big data requires cross-disciplinary skills, including the ability to make modeling decisions while balancing trade-offs between optimization and approximation, all while being attentive to useful metrics and system robustness. To develop those skills in students, it is important to identify whom to teach, that is, the educational background, experience, and characteristics of a prospective data science student; what to teach, that is, the technical and practical content that should be taught to the student; and how to teach, that is, the structure and organization of a data science program.

Click here to see The National Academies report.

Ebola-Fighting Robots

October 22nd, 2014

Credit: Worcester Polytechnic Institute

Could robots really aid in the Ebola fight?

On November 7th, robotics researchers from around the country will come together to try to answer that question. They will see if robots can prevent the spread of Ebola by possibly decontaminating infected equipment and or even burying victims.

Robin Murphy, a professor of computer science and engineering at Texas A&M University and former CCC council member, is helping to set up this Safety Robotics for Ebola Workers workshop. The workshop will bring together health care workers, relief workers and roboticists. It is co-hosted by the White House Office of Science and Technology Policy, Texas A&M, Worcester Polytechnic Institute and the University of California, Berkeley.

The goal of the workshop is for the roboticists to hear directly from those who have been working on the outbreak. That way they can learn what is needed to help patients, prevent the spread of the virus, and protect aid workers from infection.

Click here to learn more and see the Computerworld article.

NIST Global City Teams Challenge Report

October 20th, 2014

smart america global city teams

The National Institute of Standards and Technology (NIST) launched their Global City Teams Challenge with a great deal of energy and enthusiasm last month.  The workshop ended with more than a dozen presentations by potential Global City Team Challenge teams and provided an opportunity for interested parties to discuss Internet-of-Things deployments in a smart city environment.

From the workshop report:

The US Ignite website now contains materials related to 20+ potential Global City Teams projects or Action Clusters.

  1. If you would like to learn more about one of the listed projects or if you are interested in becoming associated with one of the projects, please email and
  2. If you have an existing or new smart city project that you would like to add to conduct under the auspices of the Global City Teams Challenge, please email and
  3. If you have contributed to a project on the list, but have not yet been contacted by a team leader, please email

If you are interested in the Challenge, and were not able to attend the kick-off workshop, there is a webinar scheduled for Wednesday, October 22 at 10:00am (US Eastern Time). Please use this link for the upcoming webinar: To call into the webinar, please use phone: 1 (408) 650-3131 and passcode: 832557933#. The webinar will be no more than 1-hour and will include status updates from NIST and Q&A session.

For more information, see the Global City Team Challenge website and the Smart America Global City Team website.


Accelerating the Big Data Innovation Ecosystem

September 4th, 2014


In March 2012, the Obama Administration announced the “Big Data Research and Development Initiative.” The goal is to help solve some of the Nation’s most pressing challenges by improving our ability to extract knowledge from large and complex collections of digital data. The Administration encouraged multiple stakeholders including federal agencies, private industry, academia, state and local government, non-profits, and foundations, to develop and participate in Big Data innovation projects across the country.

National Science Foundation is exploring the establishment of a national network of “Big Data Regional Innovation Hubs.” These Hubs will help to sustain new regional and grassroots partnerships around Big Data. Potential roles for Hubs include, but are not limited to:

  • Accelerate the ideation and development Big Data solutions to specific global and societal challenges by convening stakeholders across sectors to partner in results-driven programs and projects.
  • Act as a matchmaker between the various academic, industry, and community stakeholders to help drive successful pilot programs for emerging Big Data technology.
  • Coordinate across multiple regions of the country, based on shared interests and industry sector engagement to enable dialogue and share best practices.
  • Aim to increase the speed and volume of technology transfer between universities, public and private research centers and laboratories, large enterprises, and SMB’s.
  • Facilitate engagement with opinion and thought leaders on the societal impact of Big Data technologies as to maximize positive outcomes of adoption while reducing unwanted consequences.
  • Support the education and training of the entire Big Data workforce, from data scientists to managers to data end-users.

The National Science Foundation (NSF) seeks input from stakeholders across academia, state and local government, industry, and non-profits across all parts of the Big Data innovation ecosystem on the formation of Big Data Regional Innovation Hubs. Please submit a response of no more than two-pages to outlining:

  1. The goals of interest for a Big Data Regional Hub, with metrics for evaluating the success or failure of the Hub to meet that goal;
  2. The multiple stakeholders that would participate in the Hub and their respective roles and responsibilities;
  3. Plans for initial and long-term financial and in-kind resources that the stakeholders would need to commit to this hub; and
  4. A principal point of contact.

Please submit responses no later than Nov 1, 2014. For more information see the NSF announcement.


Computing a Cure for HIV

June 27th, 2014

UIUCOn June 26, the National Science Foundation (NSF) released a Discovery article titled Computing a Cure for HIV, written by Aaron Dubrow, Public Affairs Specialist in the Office of Legislative & Public Affairs.  The article provides an overview of the disease and how it continues to afflict millions of people worldwide.

Over the past decade, scientists have been using the power of supercomputers “to better understand how the HIV virus interacts with the cells it infects, to discover or design new drugs that can attack the virus at its weak spots and even to use genetic information about the exact variants of the virus to develop patient-specific treatments.”

Here are 9 projects that are using supercomputing and computational power to help fight the disease:

  1. Modeling HIV: from atoms to actions
  2. Discovery of hidden pocket in HIV protein leads to ideas for new inhibitors
  3. Preventing HIV from reaching its mature state
  4. Crowdsourcing a cure
  5. Virtual screening of HIV inhibitors
  6. Membrane effects
  7. Computing patient-specific treatment methods
  8. Preparing the next generation to continue the fight
  9. A boy and the BEAST

You can read more about these projects in the full article here.

Recent ISAT/DARPA Workshop Targeted Approximate Computing

June 23rd, 2014

The following is a special contribution to this blog by by CCC Executive Council Member Mark Hill and workshop organizers Luis Ceze, Associate Professor in the Department of Computer Science and Engineering at the University of Washington, and James Larus, Full Professor and Head of the School of Computer and Communication Sciences at the Ecole Polytechnique Federale de Lausanne

ApproximateLuis Ceze and Jim Larus organized a DARPA ISAT workshop on Approximate Computing in February, 2014. The goal was to discuss how to obtain 10-100x performance and similar improvements in MIPS/watt out of future hardware by carefully trading off accuracy of a com
putation for these other goals. The focus was not the underlying technology shifts, but rather the likely radical shifts required in hardware, software and basic computing systems properties to pervasively embrace accuracy trade-offs.

Below we provide more-detailed motivation for approximate computing, while the publicly-released slides are available here.

Given the end of Moore’s Law performance improvements and imminent end of Dennard scaling, it is imperative to find new ways to improve performance and energy efficiency of computer systems, so as to permit large and more complex problems to be tackled with constrained power envelopes, package sizes, and budgets. One promising approach is approximate computing, which relaxes the traditional digital orientation of precisely stated and verified algorithms reproducibly and correctly executed on hardware, in favor of approximate algorithms that produce “sufficiently” correct answers. The sufficiency criteria can either be a probabilistic one that results are usually correct, or it can be a more complex correctness criteria that the most “significant” bits of an answer are correct.

Approximation introduces another degree of freedom that can be used to improve computer system performance and power efficiency. For example, at one end of the spectrum of possible approximations, one can imagine computers whose circuit implementations employ aggressive voltage and timing optimizations that might introduce occasional non-deterministic errors. At another end of the spectrum, one can use analog computing techniques in select parts of the computation. One can also imagine entirely new ways of “executing” programs that are inherently approximate, e.g., what if we used neural networks to carry out “general” computations like browsing the web, running simulations, or doing search, sorting, and compression of data? Approximation opportunities go beyond just computation, since we can also imagine ways of storing data approximately that leads to potential retrieval errors, but is much denser, faster and energy efficient. Relaxing data communication is another possibility, since almost all forms of communication  (on-chip, off-chip, wireless, etc) use resources to guarantee data integrity, which is often unnecessary from the application point of view.

Obviously approximation is not a new idea, as it has been used in many areas such as lossy compression and numeric computation. However, these applications of the ideas were implemented in specific algorithms, which ran as part of a large system on a conventional processor. Much of the benefit of approximation may accrue from taking a broader systems perspective, for example by relaxing storage requirements for “approximate data”. But there has been little contemplation of what an approximate computer system would look like. What happens to the rest of the system when the processor evolves to support approximate computation? What is a programming model for approximate computation? What will programming languages and tools that directly support approximate computation look like? How do we prove approximate programs “correct”? Is there a composability model for approximate computing? How do we debug them? What will the system stack that supports approximate computing look like? How do we handle backward compatibility?