Metagenomics and the Computing Challenges of Microbial Communities

November 6th, 2009 by Ran Libeskind-Hadas Post a comment »

Why should you care about microbial communities?
Except for viruses, they are the most abundant life on Earth and have an
overwhelming effect on our environment and our lives. Consider that about
half the carbon dioxide on Earth is processed through microbes that live in
the oceans. Then consider that the most modern climate models of ocean life
include just five organisms. This is despite recent findings that point to
thousands of oceanic species, which do many different things and presumably
influence our climate.

Metagenomics is a relatively new field that seeks to understand the
structure and function of the shockingly large number of microorganisms on
our planet.  New technologies permit us to now sequence samples taken from
their environment rather than only those that are cultivated in the lab. For
example, Craig Ventner’s Global Ocean Sampling Expedition has collected water throughout the world’s oceans, captured organisms, and sequenced their DNA. In the initial pilot study alone, nearly 150 new bacteria were discovered through this process.

The science and computing challenges are huge. A single gram of soil
contains approximately one trillion base pairs of DNA. Scientists at the National Institutes of Health recently compared over 100,000 bacterial gene sequences on the human skin and discovered a far larger number of different bacteria living on human skin than had been previously known (Science, May 28, 2009). Sequencing and making sense of these data introduces new computational problems, not merely slight extensions of existing ones.

The potential impacts of understanding these data are huge as well. In the
case of soil, microbial communities have an impact on carbon sequestration
and understanding them may help us with cleaning toxic waste. In our bodies,
microbial cells are estimated to outnumber our human cells by a factor of
ten to one and are important in protecting our skin, digestion, and much
more. Understanding these large microbial communities is therefore likely to
have a positive impact on human health. The NIH has launched the Human
Microbiome Project
to support work in this field.

Complete DNA sequences of thousands of organisms are piling up in databases
because of the efficiency of DNA sequencing technologies. Most of this
remains unanalyzed for several reasons. We don’t yet know the right
biological questions to ask. We don’t have all the clever programs that
would actually ask these questions of the computer. And there is now so much
data that many questions totally overwhelm even existing high performance
computers.

Among the computational challenges in this field are the design of new
algorithms and cloud computing technologies. In the National Academies of
Science publication “The New Science of Metagenomics: Revealing the Secrets
of our Microbial Planet”
, the authors conclude “What then, will metagenomics
have become, in 20 years? We believe that it too will be a concept-driven
computational science… We can expect, in 20 years, enormous advances on
three fronts – technical, computational, and biological – as well as a host
of specific applications.”

We encourage our community to explore and engage in this and other emerging
fields at the crossroads of biology and computation. This is one of the
exciting areas for 21st century computing.

Contributed by Bill Feiereisen with assistance from Ran Libeskind-Hadas