Efficient DNA-based data storage using shortmer combinatorial encoding

被引:6
|
作者
Preuss I. [1 ,3 ]
Rosenberg M. [2 ]
Yakhini Z. [1 ,3 ]
Anavy L. [1 ,3 ]
机构
[1] School of Computer Science, Reichman University, Herzliya
[2] Institute of Nanotechnology and Advanced Materials, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan
[3] Faculty of Computer Science, Technion, Haifa
关键词
D O I
10.1038/s41598-024-58386-z
中图分类号
学科分类号
摘要
Data storage in DNA has recently emerged as a promising archival solution, offering space-efficient and long-lasting digital storage solutions. Recent studies suggest leveraging the inherent redundancy of synthesis and sequencing technologies by using composite DNA alphabets. A major challenge of this approach involves the noisy inference process, obstructing large composite alphabets. This paper introduces a novel approach for DNA-based data storage, offering, in some implementations, a 6.5-fold increase in logical density over standard DNA-based storage systems, with near-zero reconstruction error. Combinatorial DNA encoding uses a set of clearly distinguishable DNA shortmers to construct large combinatorial alphabets, where each letter consists of a subset of shortmers. We formally define various combinatorial encoding schemes and investigate their theoretical properties. These include information density and reconstruction probabilities, as well as required synthesis and sequencing multiplicities. We then propose an end-to-end design for a combinatorial DNA-based data storage system, including encoding schemes, two-dimensional (2D) error correction codes, and reconstruction algorithms, under different error regimes. We performed simulations and show, for example, that the use of 2D Reed-Solomon error correction has significantly improved reconstruction rates. We validated our approach by constructing two combinatorial sequences using Gibson assembly, imitating a 4-cycle combinatorial synthesis process. We confirmed the successful reconstruction, and established the robustness of our approach for different error types. Subsampling experiments supported the important role of sampling rate and its effect on the overall performance. Our work demonstrates the potential of combinatorial shortmer encoding for DNA-based data storage and describes some theoretical research questions and technical challenges. Combining combinatorial principles with error-correcting strategies, and investing in the development of DNA synthesis technologies that efficiently support combinatorial synthesis, can pave the way to efficient, error-resilient DNA-based storage solutions. © The Author(s) 2024.
引用
收藏
相关论文
共 50 条
  • [31] Promiscuous molecules for smarter file operations in DNA-based data storage
    Kyle J. Tomek
    Kevin Volkel
    Elaine W. Indermaur
    James M. Tuck
    Albert J. Keung
    Nature Communications, 12
  • [32] Evolutionary approach to construct robust codes for DNA-based data storage
    Rasool, Abdur
    Jiang, Qingshan
    Wang, Yang
    Huang, Xiaoluo
    Qu, Qiang
    Dai, Junbiao
    FRONTIERS IN GENETICS, 2023, 14
  • [33] Plenty of Room at at Bottom: Ten Years of DNA-Based Data Storage
    Kiah, Han Mao
    Siegel, Paul H.
    Yaakobi, Eitan
    IEEE TRANSACTIONS ON MOLECULAR BIOLOGICAL AND MULTI-SCALE COMMUNICATIONS, 2024, 10 (02): : 249 - 252
  • [34] DNA-Based Storage: Trends and Methods
    Yazdi, S. M. Hossein Tabatabaei
    Kiah, Han Mao
    Garcia-Ruiz, Eva
    Ma, Jian
    Zhao, Huimin
    Milenkovic, Olgica
    IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, 2015, 1 (03): : 230 - 248
  • [35] Promiscuous molecules for smarter file operations in DNA-based data storage
    Tomek, Kyle J.
    Volkel, Kevin
    Indermaur, Elaine W.
    Tuck, James M.
    Keung, Albert J.
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [36] DNA-Based Data Storage Systems: A Review of Implementations and Code Constructions
    Milenkovic, Olgica
    Pan, Chao
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (07) : 3803 - 3828
  • [37] On Single-Error-Detecting Codes for DNA-Based Data Storage
    Weber, Jos H.
    de Groot, Joost A. M.
    van Leeuwen, Charlot J.
    IEEE COMMUNICATIONS LETTERS, 2021, 25 (01) : 41 - 44
  • [38] A DNA-Based Archival Storage System
    Bornhol, James
    Lopez, Randolph
    Carmean, Douglas M.
    Ceze, Luis
    Seelig, Georg
    Strauss, Karin
    ACM SIGPLAN NOTICES, 2016, 51 (04) : 637 - 649
  • [39] Encoding Movies and Data in DNA Storage
    Goela, Naveen
    Bolot, Jean
    2016 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2016,
  • [40] A Robust and Efficient DNA Storage Architecture Based on Modulation Encoding and Decoding
    Zan, Xiangzhen
    Xie, Ranze
    Yao, Xiangyu
    Xu, Peng
    Liu, Wenbin
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (12) : 3967 - 3976