Efficient DNA-based data storage using shortmer combinatorial encoding

被引:6
|
作者
Preuss I. [1 ,3 ]
Rosenberg M. [2 ]
Yakhini Z. [1 ,3 ]
Anavy L. [1 ,3 ]
机构
[1] School of Computer Science, Reichman University, Herzliya
[2] Institute of Nanotechnology and Advanced Materials, The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan
[3] Faculty of Computer Science, Technion, Haifa
关键词
D O I
10.1038/s41598-024-58386-z
中图分类号
学科分类号
摘要
Data storage in DNA has recently emerged as a promising archival solution, offering space-efficient and long-lasting digital storage solutions. Recent studies suggest leveraging the inherent redundancy of synthesis and sequencing technologies by using composite DNA alphabets. A major challenge of this approach involves the noisy inference process, obstructing large composite alphabets. This paper introduces a novel approach for DNA-based data storage, offering, in some implementations, a 6.5-fold increase in logical density over standard DNA-based storage systems, with near-zero reconstruction error. Combinatorial DNA encoding uses a set of clearly distinguishable DNA shortmers to construct large combinatorial alphabets, where each letter consists of a subset of shortmers. We formally define various combinatorial encoding schemes and investigate their theoretical properties. These include information density and reconstruction probabilities, as well as required synthesis and sequencing multiplicities. We then propose an end-to-end design for a combinatorial DNA-based data storage system, including encoding schemes, two-dimensional (2D) error correction codes, and reconstruction algorithms, under different error regimes. We performed simulations and show, for example, that the use of 2D Reed-Solomon error correction has significantly improved reconstruction rates. We validated our approach by constructing two combinatorial sequences using Gibson assembly, imitating a 4-cycle combinatorial synthesis process. We confirmed the successful reconstruction, and established the robustness of our approach for different error types. Subsampling experiments supported the important role of sampling rate and its effect on the overall performance. Our work demonstrates the potential of combinatorial shortmer encoding for DNA-based data storage and describes some theoretical research questions and technical challenges. Combining combinatorial principles with error-correcting strategies, and investing in the development of DNA synthesis technologies that efficiently support combinatorial synthesis, can pave the way to efficient, error-resilient DNA-based storage solutions. © The Author(s) 2024.
引用
收藏
相关论文
共 50 条
  • [11] Data Readout Techniques for DNA-Based Information Storage
    Liu, Bingyi
    Wang, Fei
    Fan, Chunhai
    Li, Qian
    ADVANCED MATERIALS, 2025,
  • [12] Navigating Imaginaries of DNA-Based Digital Data Storage
    Kim, Raphael
    Linehan, Conor
    Pschetz, Larissa
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,
  • [13] DNA-Based Concatenated Encoding System for High-Reliability and High-Density Data Storage
    Ren, Yubin
    Zhang, Yi
    Liu, Yawei
    Wu, Qinglin
    Su, Juanjuan
    Wang, Fan
    Chen, Dong
    Fan, Chunhai
    Liu, Kai
    Zhang, Hongjie
    SMALL METHODS, 2022, 6 (04):
  • [14] Mutually Uncorrelated Primers for DNA-Based Data Storage
    Yazdi, S. M. Hossein Tabatabaei
    Kiah, Han Mao
    Gabrys, Ryan
    Milenkovic, Olgica
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (09) : 6283 - 6296
  • [15] Empowering DNA-Based Information Processing: Computation and Data Storage
    Li, Kunjie
    Chen, Heng
    Li, Dayang
    Yang, Chaoyong
    Zhang, Huimin
    Zhu, Zhi
    ACS APPLIED MATERIALS & INTERFACES, 2024, 16 (50) : 68749 - 68771
  • [16] Soft-Decision Decoding for DNA-Based Data Storage
    Zhang, Mu
    Cai, Kui
    Immink, Kees A. Schouhamer
    Chen, Pingping
    PROCEEDINGS OF 2018 INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS (ISITA2018), 2018, : 16 - 20
  • [17] An Epigenetics-Inspired DNA-Based Data Storage System
    Mayer, Clemens
    McInroy, Gordon R.
    Murat, Pierre
    Van Delft, Pieter
    Balasubramanian, Shankar
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2016, 55 (37) : 11144 - 11148
  • [18] Improved Coding Over Sets for DNA-Based Data Storage
    Wei, Hengjia
    Schwartz, Moshe
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (01) : 118 - 129
  • [19] Properties and Constructions of Constrained Codes for DNA-Based Data Storage
    Immink, Kees A. Schouhamer
    Cai, Kui
    IEEE ACCESS, 2020, 8 : 49523 - 49531
  • [20] Constrained Channel Capacity for DNA-Based Data Storage Systems
    Fan, Kaixin
    Wu, Huaming
    Yan, Zihui
    IEEE COMMUNICATIONS LETTERS, 2023, 27 (01) : 70 - 74