Efficient Encoding/Decoding of GC-Balanced Codes Correcting Tandem Duplications

被引:7
|
作者
Chee, Yeow Meng [1 ]
Chrisnata, Johan [2 ]
Kiah, Han Mao [2 ]
Tuan Thanh Nguyen [3 ]
机构
[1] Natl Univ Singapore, Dept Ind Syst Engn & Management, Singapore 119077, Singapore
[2] Nanyang Technol Univ, Sch Phys & Math Sci, Singapore 639798, Singapore
[3] Singapore Univ Technol & Design, Singapore 487372, Singapore
关键词
Error-correction codes; DNA storage; tandem duplication; GC-balanced codes; DNA;
D O I
10.1109/TIT.2020.2981069
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tandem duplication is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain et al. (2017) proposed the study of codes that correct tandem duplications. All code constructions are based on irreducible words. Such code constructions are almost optimal to combat tandem duplications of length at most k where k <= 3. However, the problem of designing efficient encoder/decoder for such codes has not been investigated. In addition, the method cannot be extended to deal with the case of arbitrary k, where k >= 4. In this work, we study efficient encoding/decoding methods for irreducible words over general q-ary alphabet. Our methods provide the first known efficient encoder/decoder for q-ary codes correcting tandem duplications of length at most k, where k <= 3. In particular, we describe an (l, m)-finite state encoder and show that when m = Theta(1/epsilon) and l = Theta(1/epsilon), the encoder achieves rate that is epsilon away from the optimal rate. We also provide ranking/unranking algorithms for irreducible words and modify the algorithms to reduce the space requirements for the finite state encoder. Over the DNA alphabet (or quaternary alphabet), we also impose weight constraint on the codewords. In particular, a quaternary word is GC-balanced if exactly half of the symbols of are either C or G. Via a modification of Knuth's balancing technique, we provide an efficient method that translates quaternary messages into GC-balanced codewords and the resulting codebook is able to correct tandem duplications of length at most k, where k <= 3. In addition, we provide the first known construction of codes to combat tandem duplications of length at most k, where k >= 4. Such codes can correct duplication errors in linear-time and they are almost optimal in terms of rate.
引用
收藏
页码:4892 / 4903
页数:12
相关论文
共 28 条
  • [1] Efficient Encoding/Decoding of Irreducible Words for Codes Correcting Tandem Duplications
    Chee, Yeow Meng
    Chrisnata, Johan
    Kiah, Han Mao
    Tuan Thanh Nguyen
    2018 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2018, : 2406 - 2410
  • [2] Constructions and decoding of GC-balanced codes for edit errors
    Wu, Kenan
    Liu, Shu
    FINITE FIELDS AND THEIR APPLICATIONS, 2024, 95
  • [3] Efficient encoding and decoding schemes for balanced codes
    Youn, JH
    Bose, B
    IEEE TRANSACTIONS ON COMPUTERS, 2003, 52 (09) : 1229 - 1232
  • [4] Balanced codes with parallel encoding and decoding
    Tallini, LG
    Bose, B
    IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (08) : 794 - 814
  • [5] Asymptotically Optimal Sticky-Insertion-Correcting Codes with Efficient Encoding and Decoding
    Mahdavifar, Hessam
    Vardy, Alexander
    2017 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2017, : 2683 - 2687
  • [6] Some improved encoding and decoding schemes for balanced codes
    Youn, JH
    Bose, B
    2000 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2000, : 103 - 109
  • [7] Transient behavior of the encoding/decoding circuits of error correcting codes
    Lo, JC
    Wan, YL
    Fujiwara, E
    DFT 2005: 20TH IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, 2005, : 120 - 128
  • [8] Error-correcting codes for short tandem duplications and at most p substitutions
    Tang, Yuanyuan
    Lou, Hao
    Farnoud, Farzad
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 1835 - 1840
  • [9] DESIGN OF EFFICIENT ERROR-CORRECTING BALANCED CODES
    ALBASSAM, S
    BOSE, B
    IEEE TRANSACTIONS ON COMPUTERS, 1993, 42 (10) : 1261 - 1266
  • [10] Low-Power Cooling Codes with Efficient Encoding and Decoding
    Chee, Yeow Meng
    Etzion, Tuvi
    Kiah, Han Mao
    Vardy, Alexander
    Wei, Hengjia
    2018 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2018, : 1655 - 1659