Robust data storage in DNA by de Bruijn graph-based de novo strand assembly

被引:25
|
作者
Song, Lifu [1 ,2 ,3 ]
Geng, Feng [4 ]
Gong, Zi-Yi [1 ,2 ,3 ]
Chen, Xin [5 ]
Tang, Jijun [6 ,7 ]
Gong, Chunye [8 ]
Zhou, Libang [9 ]
Xia, Rui [8 ]
Han, Ming-Zhe [1 ,2 ,3 ]
Xu, Jing-Yi [1 ,2 ,3 ]
Li, Bing-Zhi [1 ,2 ,3 ]
Yuan, Ying-Jin [1 ,2 ,3 ]
机构
[1] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
[4] Binzhou Med Univ, Coll Pharm, Yantai 264003, Shandong, Peoples R China
[5] Tianjin Univ, Centor Appl Math, Tianjin 300072, Peoples R China
[6] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[7] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[8] Natl SuperComp Ctr Tianjin, Tianjin 300457, Peoples R China
[9] Nanjing Agr Univ, Coliege Food Sci & Technol, Nanjing 210095, Jiangsu, Peoples R China
关键词
MULTIPLE SEQUENCE ALIGNMENT; DIGITAL INFORMATION; SYNTHETIC DNA; ERROR RATES; RECONSTRUCTION;
D O I
10.1038/s41467-022-33046-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. Here the authors present a strand assembly algorithm (DBGPS) using de Bruijn graph and greedy path search. DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 degrees C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] DNA Assembly with de Bruijn Graphs on FPGA
    Poirier, Carl
    Gosselin, Benoit
    Fortier, Paul
    [J]. 2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 6489 - 6492
  • [22] IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels
    Peng, Yu
    Leung, Henry C. M.
    Yiu, Siu-Ming
    Lv, Ming-Ju
    Zhu, Xin-Guang
    Chin, Francis Y. L.
    [J]. BIOINFORMATICS, 2013, 29 (13) : 326 - 334
  • [23] T-IDBA: A de novo Iterative de Bruijn Graph Assembler for Transcriptome
    Peng, Yu
    Leung, Henry C. M.
    Yiu, S. M.
    Chin, Francis Y. L.
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, 2011, 6577 : 337 - 338
  • [24] Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
    Zerbino, Daniel R.
    Birney, Ewan
    [J]. GENOME RESEARCH, 2008, 18 (05) : 821 - 829
  • [25] Combining De Bruijn Graphs, Overlap Graphs and Microassembly for De Novo Genome Assembly
    Sergushichev, A. A.
    Alexandrov, A. V.
    Kazakov, S. V.
    Tsarev, F. N.
    Shalyto, A. A.
    [J]. IZVESTIYA SARATOVSKOGO UNIVERSITETA NOVAYA SERIYA-MATEMATIKA MEKHANIKA INFORMATIKA, 2013, 13 (02): : 10 - 10
  • [26] Parallelized De Bruijn graph construction and simplification for genome assembly
    [J]. Cheng, J.-F. (jiefengcheng@gmail.com), 1600, Chinese Academy of Sciences (24):
  • [27] Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis
    Ye, Yuzhen
    Tang, Haixu
    [J]. BIOINFORMATICS, 2016, 32 (07) : 1001 - 1008
  • [28] Graph Theoretical Strategies in De Novo Assembly
    Behizadi, Kimia
    Jafarzadeh, Nafiseh
    Iranmanesh, Ali
    [J]. IEEE ACCESS, 2022, 10 : 9328 - 9339
  • [29] Approaches to DNA de novo assembly
    Sovic, Ivan
    Skala, Karolj
    Sikic, Mile
    [J]. 2013 36TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2013, : 351 - 359
  • [30] deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index
    Liu, Bo
    Liu, Yadong
    Li, Junyi
    Guo, Hongzhe
    Zang, Tianyi
    Wang, Yadong
    [J]. GENOME BIOLOGY, 2019, 20 (01)