Robust data storage in DNA by de Bruijn graph-based de novo strand assembly

被引:22
|
作者
Song, Lifu [1 ,2 ,3 ]
Geng, Feng [4 ]
Gong, Zi-Yi [1 ,2 ,3 ]
Chen, Xin [5 ]
Tang, Jijun [6 ,7 ]
Gong, Chunye [8 ]
Zhou, Libang [9 ]
Xia, Rui [8 ]
Han, Ming-Zhe [1 ,2 ,3 ]
Xu, Jing-Yi [1 ,2 ,3 ]
Li, Bing-Zhi [1 ,2 ,3 ]
Yuan, Ying-Jin [1 ,2 ,3 ]
机构
[1] Tianjin Univ, Frontiers Sci Ctr Synthet Biol, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Key Lab Syst Bioengn, Minist Educ, Tianjin 300072, Peoples R China
[3] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
[4] Binzhou Med Univ, Coll Pharm, Yantai 264003, Shandong, Peoples R China
[5] Tianjin Univ, Centor Appl Math, Tianjin 300072, Peoples R China
[6] Tianjin Univ, Coll Intelligence & Comp, Sch Comp Sci & Technol, Tianjin 300350, Peoples R China
[7] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[8] Natl SuperComp Ctr Tianjin, Tianjin 300457, Peoples R China
[9] Nanjing Agr Univ, Coliege Food Sci & Technol, Nanjing 210095, Jiangsu, Peoples R China
关键词
MULTIPLE SEQUENCE ALIGNMENT; DIGITAL INFORMATION; SYNTHETIC DNA; ERROR RATES; RECONSTRUCTION;
D O I
10.1038/s41467-022-33046-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. Here the authors present a strand assembly algorithm (DBGPS) using de Bruijn graph and greedy path search. DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 degrees C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Robust data storage in DNA by de Bruijn graph-based de novo strand assembly
    Lifu Song
    Feng Geng
    Zi-Yi Gong
    Xin Chen
    Jijun Tang
    Chunye Gong
    Libang Zhou
    Rui Xia
    Ming-Zhe Han
    Jing-Yi Xu
    Bing-Zhi Li
    Ying-Jin Yuan
    [J]. Nature Communications, 13
  • [2] A Classification of de Bruijn Graph Approaches for De Novo Fragment Assembly
    de Armas, Elvismary Molina
    Holanda, Maristela
    de Oliveira, Daniel
    Almeida, Nalvo F.
    Lifschitz, Sergio
    [J]. ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2020, 2020, 12558 : 1 - 12
  • [3] Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly
    Georganas, Evangelos
    Buluc, Aydin
    Chapman, Jarrod
    Oliker, Leonid
    Rokhsar, Daniel
    Yelick, Katherine
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 437 - 448
  • [4] Exploration of de Bruijn graph filtering for de novo assembly using GraphLab
    Collet, Julien
    Sassolas, Tanguy
    Lhuillier, Yves
    Sirdey, Renaud
    Carlier, Jacques
    [J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 530 - 539
  • [5] Accelerating de Bruijn Graph-based Genome Assembly for High-Throughput Short Read Data
    Zhao, Kun
    Liu, Weiguo
    Voss, Gerrit
    Mueller-Wittig, Wolfgang
    [J]. 2013 19TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2013), 2013, : 426 - 427
  • [6] PANDA: Processing in Magnetic Random-Access Memory-Accelerated de Bruijn Graph-Based DNA Assembly
    Angizi, Shaahin
    Fahmi, Naima Ahmed
    Najafi, Deniz
    Zhang, Wei
    Fan, Deliang
    [J]. JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2024, 14 (01)
  • [7] Proposal of a New Method for de Novo DNA Sequence Assembly Using de Bruijn Graphs
    Couto, Adriano Donato
    Cerqueira, Fabio Ribeiro
    Ferreira, Ricardo dos Santos
    Oliveira, Alcione de Paiva
    [J]. INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 307 - 317
  • [8] IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler
    Peng, Yu
    Leung, Henry C. M.
    Yiu, S. M.
    Chin, Francis Y. L.
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2010, 6044 : 426 - 440
  • [9] deBGA: read alignment with de Bruijn graph-based seed and extension
    Liu, Bo
    Guo, Hongzhe
    Brudno, Michael
    Wang, Yadong
    [J]. BIOINFORMATICS, 2016, 32 (21) : 3224 - 3232
  • [10] A de novo Genome Assembler based on MapReduce and Bi-directed de Bruijn Graph
    Zhang, Yuehua
    Xuan, Pengfei
    Wang, Yunsheng
    Srimani, Pradip K.
    Luo, Feng
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 65 - 71