Efficient DNA-based data storage using shortmer combinatorial encoding

被引:0
|
作者
Preuss, Inbal [1 ,3 ]
Rosenberg, Michael [2 ]
Yakhini, Zohar [1 ,3 ]
Anavy, Leon [1 ,3 ]
机构
[1] Reichman Univ, Sch Comp Sci, IL-4610101 Herzliyya, Israel
[2] Bar Ilan Univ, Inst Nanotechnol & Adv Mat, Mina & Everard Goodman Fac Life Sci, IL-5290002 Ramat Gan, Israel
[3] Fac Comp Sci, Technion, IL-3200003 Haifa, Israel
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
INFORMATION-STORAGE; GENERATION;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Data storage in DNA has recently emerged as a promising archival solution, offering space-efficient and long-lasting digital storage solutions. Recent studies suggest leveraging the inherent redundancy of synthesis and sequencing technologies by using composite DNA alphabets. A major challenge of this approach involves the noisy inference process, obstructing large composite alphabets. This paper introduces a novel approach for DNA-based data storage, offering, in some implementations, a 6.5-fold increase in logical density over standard DNA-based storage systems, with near-zero reconstruction error. Combinatorial DNA encoding uses a set of clearly distinguishable DNA shortmers to construct large combinatorial alphabets, where each letter consists of a subset of shortmers. We formally define various combinatorial encoding schemes and investigate their theoretical properties. These include information density and reconstruction probabilities, as well as required synthesis and sequencing multiplicities. We then propose an end-to-end design for a combinatorial DNA-based data storage system, including encoding schemes, two-dimensional (2D) error correction codes, and reconstruction algorithms, under different error regimes. We performed simulations and show, for example, that the use of 2D Reed-Solomon error correction has significantly improved reconstruction rates. We validated our approach by constructing two combinatorial sequences using Gibson assembly, imitating a 4-cycle combinatorial synthesis process. We confirmed the successful reconstruction, and established the robustness of our approach for different error types. Subsampling experiments supported the important role of sampling rate and its effect on the overall performance. Our work demonstrates the potential of combinatorial shortmer encoding for DNA-based data storage and describes some theoretical research questions and technical challenges. Combining combinatorial principles with error-correcting strategies, and investing in the development of DNA synthesis technologies that efficiently support combinatorial synthesis, can pave the way to efficient, error-resilient DNA-based storage solutions.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Portable and Error-Free DNA-Based Data Storage
    S. M. Hossein Tabatabaei Yazdi
    Ryan Gabrys
    Olgica Milenkovic
    Scientific Reports, 7
  • [22] Portable and Error-Free DNA-Based Data Storage
    Yazdi, S. M. Hossein Tabatabaei
    Gabrys, Ryan
    Milenkovic, Olgica
    SCIENTIFIC REPORTS, 2017, 7
  • [23] Constrained Coding with Error Control for DNA-Based Data Storage
    Nguyen, Tuan Thanh
    Cai, Kui
    Immink, Kees A. Schouhamer
    Kiah, Han Mao
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 694 - 699
  • [24] On the Capacity of DNA-based Data Storage under Substitution Errors
    Lenz, Andreas
    Siegel, Paul H.
    Wachter-Zeh, Antonia
    Yaakobi, Eitan
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [25] A DNA Data Storage Method Using Spatial Encoding Based Lossless Compression
    Satir, Esra
    ENTROPY, 2024, 26 (12)
  • [26] Efficient Balanced and Maximum Homopolymer-Run Restricted Block Codes for DNA-Based Data Storage
    Immink, Kees A. Schouhamer
    Cai, Kui
    IEEE COMMUNICATIONS LETTERS, 2019, 23 (10) : 1676 - 1679
  • [27] DNA-Based Storage of RDF Graph Data: A Futuristic Approach to Data Analytics
    Usmani, Asad
    Wiese, Lena
    IEEE ACCESS, 2023, 11 (129931-129944): : 129931 - 129944
  • [28] Improved read/write cost tradeoff in DNA-based data storage using LDPC codes
    Chandak, Shubham
    Tatwawadi, Kedar
    Lau, Billy
    Mardia, Jay
    Kubit, Matt
    Neu, Joachim
    Griffin, Peter
    Wootters, Mary
    Weissman, Tsachy
    Ji, Hanlee
    2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 147 - 156
  • [29] A DNA-based Encoding and Retrieving Method for Jiaguwen
    Li Qing-sheng
    Yang Yu-xing
    Wang Ai-min
    ICCSIT 2010 - 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 4, 2010, : 51 - 55
  • [30] FrameD: framework for DNA-based data storage design, verification, and validation
    Volkel, Kevin D.
    Lin, Kevin N.
    Hook, Paul W.
    Timp, Winston
    Keung, Albert J.
    Tuck, James M.
    BIOINFORMATICS, 2023, 39 (10)