Robust data storage in DNA by de Bruijn graph-based de novo strand assembly

被引:35
|
作者
Song, Lifu [1 ,2 ,3 ]
Geng, Feng [4 ]
Gong, Zi-Yi [1 ,2 ,3 ]
Chen, Xin [5 ]
Tang, Jijun [6 ,7 ]
Gong, Chunye [8 ]
Zhou, Libang [9 ]
Xia, Rui [8 ]
Han, Ming-Zhe [1 ,2 ,3 ]
Xu, Jing-Yi [1 ,2 ,3 ]
Li, Bing-Zhi [1 ,2 ,3 ]
Yuan, Ying-Jin [1 ,2 ,3 ]
机构
[1] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
[4] Binzhou Med Univ, Coll Pharm, Yantai 264003, Shandong, Peoples R China
[5] Tianjin Univ, Centor Appl Math, Tianjin 300072, Peoples R China
[6] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[7] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[8] Natl SuperComp Ctr Tianjin, Tianjin 300457, Peoples R China
[9] Nanjing Agr Univ, Coliege Food Sci & Technol, Nanjing 210095, Jiangsu, Peoples R China
关键词
MULTIPLE SEQUENCE ALIGNMENT; DIGITAL INFORMATION; SYNTHETIC DNA; ERROR RATES; RECONSTRUCTION;
D O I
10.1038/s41467-022-33046-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. Here the authors present a strand assembly algorithm (DBGPS) using de Bruijn graph and greedy path search. DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 degrees C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] deSPI: efficient classification of metagenomics reads with lightweight de Bruijn graph-based reference indexing
    Guan, Dengfeng
    Liu, Bo
    Wang, Yadong
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 265 - 269
  • [32] deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index
    Bo Liu
    Yadong Liu
    Junyi Li
    Hongzhe Guo
    Tianyi Zang
    Yadong Wang
    Genome Biology, 20
  • [33] De Novo Molecular Design using a Graph-Based Genetic Algorithm Approach
    Herring, Robert H., III
    Eden, Mario R.
    24TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, PTS A AND B, 2014, 33 : 7 - 12
  • [34] Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis
    Malik, Laraib
    Almodaresi, Fatemeh
    Patro, Rob
    BIOINFORMATICS, 2018, 34 (19) : 3265 - 3272
  • [35] HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
    Limon, Mahfuzer Rahman
    Sharker, Ratul
    Biswas, Sajib
    Rahman, M. Sohel
    INTERNATIONAL JOURNAL OF GENOMICS, 2017, 2017
  • [36] A Dynamic Hashing Approach to Build the de Bruijn Graph for Genome Assembly
    Zhao, Kun
    Liu, Weiguo
    Voss, Gerrit
    Mueller-Wittig, Wolfgang
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [37] DNA Assembly with De Bruijn Graphs Using an FPGA Platform
    Poirier, Carl
    Gosselin, Benoit
    Fortier, Paul
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (03) : 1003 - 1009
  • [38] FSG: Fast String Graph Construction for De Novo Assembly of Reads Data
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Pirola, Yuri
    Previtali, Marco
    Rizzi, Raffaella
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2016, 2016, 9683 : 27 - 39
  • [39] Evaluating de Bruijn Graph Assemblers on 454 Transcriptomic Data
    Ren, Xianwen
    Liu, Tao
    Dong, Jie
    Sun, Lilian
    Yang, Jian
    Zhu, Yafang
    Jin, Qi
    PLOS ONE, 2012, 7 (12):
  • [40] Faucet: streaming de novo assembly graph construction
    Rozov, Roye
    Goldshlager, Gil
    Halperin, Eran
    Shamir, Ron
    BIOINFORMATICS, 2018, 34 (01) : 147 - 154