The winner of our article competition through Octorank examines progress towards the use of DNA as a method of data storage. Great work, Alison!
A recent publication in the March edition of the scientific journal Science reports an important advancement in using DNA as a method of data storage. This development has prompted a re-visitation of the idea to use DNA as a stable archival medium. How close are we to realizing this idea, and how does it fit into the European biotech landscape?
Deluge of Data
With the mass adoption of the internet, social media and big data approaches in business and government, an analytics blog by Northeastern University estimates that the sum of data produced each day is 2.5 exabytes — that’s roughly 530 trillion copies of the song, ‘Around the World,’ by Daft Punk.
However, the rate of production far exceeds the growth of the capacity to store. This problem is further compounded by the fact that the lifespan of most data storage technologies is about 20 years. One elegant solution proposed to mitigate capacity and durability issues has been to use the fundamental data storage unit of biology, DNA.
A Reliable 10,000-year-old Hard Drive
Utilization of DNA as a means of high-density data storage was first introduced to the scientific community in 2012-2013 through independent proof of principle experiments by George Church and colleagues at Harvard University and Nick Goldman and colleagues at the European Bioinformatics Institute. DNA, when stored in cool dry conditions, is remarkably stable. The successful sequencing of 700,000-year old horse DNA recovered from the arctic permafrost is a testament to its longevity.
DNA also has an incredible capacity for the density of information – recall that the 3 billion base pairs that encode our genome are folded into a structure with a diameter of a mere 6 µm. Furthermore, while the speed of innovation rapidly demotes technological devices to vintage curiosities, the need for reading the genetic code will persist for as long as humans do. Despite such promise, several barriers must be overcome before of DNA as a storage device becomes a reality.
Mind the Read-Write Gap
Advances in sequencing (reading) DNA has far outpaced those for synthesizing (writing) DNA. The time to sequence a human genome plummeted from 13 years to 24 hours, and it can be performed for around €900. With the current rate of €0.25 per synthesized base pair, writing a whole human genome would a hefty €1B.
Commercial ventures looking to revolutionizing the field of DNA synthesis include American start-ups like Gen9, which was recently acquired by Ginkgo Bioworks, and Twist, which are implementing multiplex platforms to scale up processes. Other companies such as Molecular Assemblies are developing novel enzymatic based methods to synthesize DNA.
Market demand should also increase with the adoption of CRISPR/Cas9 and growth in synthetic biology, thus driving down the price through improvements to DNA synthesis chemistry.
Effective Access Memories
In addition to the high costs of producing DNA, another hurdle has been maximizing the amount of information that can be stored in each nucleotide. A recent paper in Science described a storage method dubbed ‘DNA Fountain’ to access 85% of the predicted 1.8 bits that could be stored per nucleotide and an extraordinary 60% increase compared to previous methods.
Intriguingly, the authors propose that less refined DNA could be used to store the data, as strong decoding tools can enable “perfect decoding of the data from conditions that are well below the initial quality and quantity of the oligo manufacturer while still approaching the information capacity.” Thus, with improved technologies for accessing information stored in DNA, we may not need to wait for the cost of DNA synthesis to dip dramatically.
Current prices are reflective of high-quality DNA for traditional uses in synthetic biology that are sensitive to errors. So-called ‘quick and dirty’ DNA preparations that sacrifice purity but utilize less material and take less time to synthesize and might suffice for DNA data storage.
A Chance for European Synthetic Biology Clusters?
Hand in hand with the influx of information comes the importance of data security. Since DNA data storage represents a stable, low energy method of storing large quantities of information that can be passed onto future generations, European biotech companies will strive to be well positioned for the emergence of this technology. This strategy is evidenced by initiatives of the public-private partnership SynbiCITE, such as the creation of 5 different DNA foundries across the UK to stimulate research and the commercialisation of products in the realm of synthetic biology.
As DNA synthesis chemistry costs come down, it will be interesting to watch out for the emergence of spin-offs from both the synthetic biology community as well as its computational counterpart, such as the European Bioinformatics Institute, which has been home to some of the pioneering work in the field of DNA data storage.
Alison Hirukawa is interested in emerging technologies and dreams of playing beach volleyball in Rio de Janeiro. She is currently at the finish line of her PhD on cancer epigenetics from McGill University in Montréal.
Images from kentoh, Matthieu Tuffet, Raimundo79, Sergei Drozd / shutterstock.com