Risk of error eliminated for storing data on artificial pieces of DNA


British scientists Wednesday announced a breakthrough in the quest to turn DNA into a revolutionary form of data storage.

A speck of man-made DNA can hold mountains of data that can be freeze-dried, shipped and stored, potentially for thousands of years, they said. The contents are “read” by sequencing the DNA — as is routinely done today, in genetic fingerprinting and so on — and turning it back into computer code.

“We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and make sense of it,” said Nick Goldman of the European Bioinformatics Institute in Cambridge. “It’s also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy.”

The double helix of DNA is a molecular “ladder” made of four chemical rungs — adenine, cytosine, guanine and thymine — which team up in pairs. C teams up with G, and T teams up with A. The letter sequence comprises the genome, or the chemical blueprint for making and sustaining life. Human DNA has more than 3 billion letters, coiled into packages of 24 chromosomes.

The project entails taking data in the form of zeros and ones in computing’s binary code and transcribing it into “base-three” code, which uses zeros, ones and twos.

The data are transcribed for a second time into DNA code, which is based on the A, C, G and T. A block of five letters is used for a single binary digit. The letters are then turned into molecules, using lab-dish chemicals.

The work does not entail using any living DNA, nor does it seek to create any life form. In fact the man-made code would be quite useless in anything biological, the researchers said. “We have absolutely no intention of messing with life,” said Goldman.

Only short strings of DNA can be made, which means the message has to be chopped up into small sections of 117 letters, each attached to a tiny address tag, rather like packet-switching in Internet data, which enables data to be reassembled.

To prove their concept, the team encoded an MP3 recording of Martin Luther King’s “I Have A Dream” speech, a digital photo of their lab, a PDF of the landmark study in 1953 that described the structure of DNA, a file of all of Shakespeare’s sonnets and a document that describes the data storage technique.

“We downloaded the files from the Web and used them to synthesize hundreds of thousands of pieces of DNA. The result looks like a tiny piece of dust,” said Emily Leproust of Agilent Technologies, a U.S. biotech company that took the digital data and used them to synthesise molecules of DNA in the lab.

Agilent then mailed the sample back across the Atlantic to the European Bioinformatics Institute, where the researchers soaked the DNA in water to reconstitute it and used standard sequencing machines to unravel the code. They recovered and read the files with 100 percent accuracy.

The work follows a big step last year when scientists at Harvard announced they had stored 700 terabytes of data — enough for around 70,000 movies — in a gram of DNA.

The new method eliminates the risk of error when the DNA is read, say the researchers, whose work appears in the journal Nature.

Data are accumulating massively around the world, and storing it all is a headache. Magnetic and optical discs are big, need to be kept in cool, dry conditions and are prone to decay.

“The only limit (for DNA storage) is the cost,” said Birney.

Sequencing and reading the DNA takes a couple of weeks with present technology, so it is not suitable for jobs needing instant data retrieval.

Instead, it would be appropriate for data that would be stored for between 500 and 5,000 years, such as a doomsday encyclopaedia of knowledge and culture. But on current trends, sequencing costs could fall by a factor of 20 within a decade, making DNA storage economically feasible for time frames of less than 50 years, the authors claim.