Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms

被引:68
|
作者
Jain, Siddharth [1 ]
Farnoud , Farzad [1 ,2 ,3 ]
Schwartz, Moshe [4 ]
Bruck, Jehoshua [1 ]
机构
[1] CALTECH, Dept Elect Engn, Pasadena, CA 91125 USA
[2] Univ Virginia, Dept Elect & Comp Engn, Charlottesville, VA 22903 USA
[3] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA
[4] Ben Gurion Univ Negev, Dept Elect & Comp Engn, IL-8410501 Beer Sheva, Israel
基金
美国国家科学基金会;
关键词
Error-correcting codes; DNA; string-duplication systems; tandem-duplication errors; TANDEM REPEATS; CAPACITY; EVOLUTION; CHANNELS;
D O I
10.1109/TIT.2017.2688361
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically modified organisms. Data stored in this medium are subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present two families of codes for correcting errors due to tandem duplications of a fixed length: the first family can correct any number of errors, while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant k, where we are primarily focused on the cases of k = 2, 3. Finally, we provide a full classification of the sets of lengths allowed in tandem duplication that result in a unique root for all sequences.
引用
收藏
页码:4996 / 5010
页数:15
相关论文
共 50 条
  • [31] Lossless data compression with error correcting codes
    Caire, G
    Shamai, S
    Verdú, S
    2003 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY - PROCEEDINGS, 2003, : 22 - 22
  • [32] ARE THERE ANY FRACTALS IN DNA OF LIVING ORGANISMS
    CHATZIDIMITRIOUDREISMANN, CA
    STREFFER, RMF
    LARHAMMAR, D
    BERICHTE DER BUNSEN-GESELLSCHAFT-PHYSICAL CHEMISTRY CHEMICAL PHYSICS, 1994, 98 (09): : 1141 - 1141
  • [33] Error-Correcting Codes for Combinatorial Composite DNA
    Sabary, Omer
    Preuss, Inbal
    Gabrys, Ryan
    Yakhini, Zohar
    Anavy, Leon
    Yaakobi, Eitan
    2024 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, ISIT 2024, 2024, : 109 - 114
  • [34] Codes for DNA Storage Channels
    Kiah, Han Mao
    Puleo, Gregory J.
    Milenkovic, Olgica
    2015 IEEE INFORMATION THEORY WORKSHOP (ITW), 2015,
  • [35] A Crossbreed Data Storage for Sheltered Data Duplication
    Hema, G.
    Manohari, K.
    JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (02) : 1480 - 1483
  • [36] Properties and Constructions of Constrained Codes for DNA-Based Data Storage
    Immink, Kees A. Schouhamer
    Cai, Kui
    IEEE ACCESS, 2020, 8 : 49523 - 49531
  • [37] New Construction of Balanced Codes Based on Weights of Data for DNA Storage
    Lu, Xiaozhou
    Kim, Sunghwan
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (04) : 973 - 984
  • [38] Construction of single-deletion-correcting DNA codes using CIS codes
    Choi, Whan-Hyuk
    Kim, Hyun Jin
    Lee, Yoonjin
    DESIGNS CODES AND CRYPTOGRAPHY, 2020, 88 (12) : 2581 - 2596
  • [39] Construction of single-deletion-correcting DNA codes using CIS codes
    Whan-Hyuk Choi
    Hyun Jin Kim
    Yoonjin Lee
    Designs, Codes and Cryptography, 2020, 88 : 2581 - 2596
  • [40] Low-Redundancy Codes for Correcting Multiple Short-Duplication and Edit Errors
    Tang, Yuanyuan
    Wang, Shuche
    Lou, Hao
    Gabrys, Ryan
    Farnoud, Farzad
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (05) : 2940 - 2954