Multidimensional data organization and random access in large-scale DNA storage systems

被引:0
|
作者
Song, Xin [1 ,2 ,3 ]
Shah, Shalin [1 ,3 ]
Reif, John [1 ,3 ]
机构
[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA
[2] Duke Univ, Dept Biomed Engn, Durham, NC 27708 USA
[3] Duke Univ, Dept Comp Sci, Durham, NC 27708 USA
关键词
DNA storage; Hierarchical memory; Data random access; Nested PCR; Amplification bias; PCR stochasticity; PRIMER; PCR;
D O I
10.1016/j.tcs.2021.09.021
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With impressive physical density and molecular-scale coding capacity, DNA is a promising substrate for building long-lasting data archival storage systems. To retrieve data from DNA storage, recent implementations typically rely on large libraries of meticulously designed orthogonal PCR primers, which fundamentally limit the capacity and scalability of practical DNA storage. This work combines nested and semi-nested PCR to enable multidimensional data organization and random access in large DNA storage. Our strategy effectively pushes the limit of DNA storage capacity and dramatically reduces the number of orthogonal primers needed for efficient PCR random access. Our design uses only k * n primers to uniquely address nkdata-encoding oligos. The architecture inherently supports various well-defined PCR random-access patterns that can be tailored to organize and preserve the underlying DNA-encoded data structures and relations in simple database-like formats such as rows, columns, tables, and blocks of data entries. We design in silico PCR experiments of a four-dimensional DNA storage to illustrate the mechanisms of sixteen different random-access patterns each requiring no more than two PCR reactions to selectively amplify a target dataset of various sizes. To better approximate the physical system, we formulate mathematical models based on empirical distributions to analyze the effect of pipetting, PCR bias, and PCR stochasticity on the performance of multidimensional data queries from large DNA storage. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:190 / 202
页数:13
相关论文
共 50 条
  • [1] Multidimensional data organization and random access in large-scale DNA storage systems
    Song, Xin
    Shah, Shalin
    Reif, John
    [J]. Theoretical Computer Science, 2021, 894 : 190 - 202
  • [2] Erratum: Random access in large-scale DNA data storage
    Lee Organick
    Siena Dumas Ang
    Yuan-Jyue Chen
    Randolph Lopez
    Sergey Yekhanin
    Konstantin Makarychev
    Miklos Z Racz
    Govinda Kamath
    Parikshit Gopalan
    Bichlien Nguyen
    Christopher N Takahashi
    Sharon Newman
    Hsing-Yeh Parker
    Cyrus Rashtchian
    Kendall Stewart
    Gagan Gupta
    Robert Carlson
    John Mulligan
    Douglas Carmean
    Georg Seelig
    Luis Ceze
    Karin Strauss
    [J]. Nature Biotechnology, 2018, 36 : 660 - 660
  • [3] Random access in large-scale DNA data storage (vol 36, pg 242, 2018)
    Organick, Lee
    Ang, Siena Dumas
    Chen, Yuan-Jyue
    Lopez, Randolph
    Yekhanin, Sergey
    Makarychev, Konstantin
    Racz, Miklos Z.
    Kamath, Govinda
    Gopalan, Parikshit
    Nguyen, Bichlien
    Takahashi, Christopher N.
    Newman, Sharon
    Parker, Hsing-Yeh
    Rashtchian, Cyrus
    Stewart, Kendall
    Gupta, Gagan
    Carlson, Robert
    Mulligan, John
    Carmean, Douglas
    Seelig, Georg
    Ceze, Luis
    Strauss, Karin
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (07) : 660 - 660
  • [4] Random access in large-scale DNA data storage (vol 36, pg 242, 2018)
    Organick, Lee
    Ang, Siena Dumas
    Chen, Yuan-Jyue
    Lopez, Randolph
    Yekhanin, Sergey
    Makarychev, Konstantin
    Racz, Miklos Z.
    Kamath, Govinda
    Gopalan, Parikshit
    Nguyen, Bichlien
    Takahashi, Christopher N.
    Newman, Sharon
    Parker, Hsing-Yeh
    Rashtchian, Cyrus
    Stewart, Kendall
    Gupta, Gagan
    Carlson, Robert
    Mulligan, John
    Carmean, Douglas
    Seelig, Georg
    Ceze, Luis
    Strauss, Karin
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (03) : 242 - +
  • [5] Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems
    Miranda, Alberto
    Effert, Sascha
    Kang, Yangwook
    Miller, Ethan L.
    Popov, Ivan
    Brinkmann, Andre
    Friedetzky, Tom
    Cortes, Toni
    [J]. ACM TRANSACTIONS ON STORAGE, 2014, 10 (03)
  • [6] A Comparison of Systems to Large-Scale Data Access
    Mesmoudi, Amin
    Hacid, Mohand-Said
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, 2014, 8505 : 161 - 175
  • [7] High-scale random access on DNA storage systems
    El-Shaikh, Alex
    Welzel, Marius
    Heider, Dominik
    Seeger, Bernhard
    [J]. NAR GENOMICS AND BIOINFORMATICS, 2022, 4 (01)
  • [8] Optimizing data robustness in large-scale storage systems
    Gougeaud, Sebastien
    Zertal, Soraya
    Lafoucriere, Jacques-Charles
    Deniel, Philippe
    [J]. 2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 236 - 243
  • [9] A Data Storage Approach for Large-Scale Distributed Medical Systems
    de Macedo, Douglas D. J.
    von Wangenheim, Aldo
    Dantas, Mario A. R.
    [J]. 2015 9TH INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS CISIS 2015, 2015, : 486 - 490