Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.

Store your (Big) Data in the Code of Life?

May 19th, 2016 / in CCC, research horizons, Research News / by Helen Wright

DNAThe following is a special contribution to this blog by CCC Executive Council Member Mark D. Hill of the University of Wisconsin-Madison. Full disclosure: He is working with one of the authors—Luis Ceze—and Tom Wenisch on visioning via Architecture 2030 at ISCA 2016.  

The invention of writing enabled us to reliably transmit information into the future. Stone tablets, papyrus, velum, and paper can be read centuries if not millennia later. But how much of the digital information that we created over the last 75 years will be readable much later? How much is even readable now?

Wouldn’t it be valuable if we could record digital information in a medium that will last centuries and which we have incentive to always be able to read? Even better would be a medium that permits dense, high volume storage with reasonable access time. Magnetic tape and optic disks can last decades to a century, but they are not dense enough for truly massive data, and they quickly become obsolete and need to be rewritten.

Researchers at Washington and Microsoft Research have taken a step in the direction in a paper presented at the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) titled A DNA-Based Archival Storage System. The medium they propose is DNA! DNA lasts—researchers have read 40,000-year-old Neanderthal DNA—and, as the code of life, we have great incentive to remember how to read it. It is digital—each nucleotide can in theory encode two bits selecting one of adenine (A), cytosine (C), guanine (G), and thymine (T). But there are challenges.

First, one must be able to reliably write DNA. Fortunately, the biotech industry has develop the basic tools for de-novo DNA synthesis. Still, they need to be scaled by several orders of magnitude before DNA storage becomes viable.

Second, data must be encoded with more redundancy than two-bits per nucleotide due to relatively high raw error rates in DNA writing and ready. But luckily computer scientists are pretty good at coding. Indeed the results presented at ASPLOS retrieved all data stored bit by bit, despite high raw error rates in the DNA write and read process.

Third, data must be read from DNA. DNA sequencing has been improving very fast — 10,000X performance improvement in the past decade! If it continues at this pace, it will soon be fast enough for storage.

In summary, while it may sound like science fiction, progress in DNA-based data storage has been rapid and if it succeeds, it may replace tape as archival technology.

DNA storage might just be the first step towards building computers using components from biology.

Store your (Big) Data in the Code of Life?