DNA Compression using Referential Compression Algorithm

被引:0
|
作者
Mehta, Kanika [1 ]
Ghrera, Satya Prakash [1 ]
机构
[1] Jaypee Univ Informat Technol, Dept Comp Sci & Engn, Solan 173234, Himachal Prades, India
关键词
Referential Compression; sequences; suffix array; fingerprints;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With rapid technological development and growth of sequencing data, an umpteen gamut of biological data has been generated. As an alternative, Data Compression is employed to reduce the size of data. In this direction, this paper proposes a new reference-based compression approach, which is employed as a solution. Firstly, a reference has been constructed from the common sub strings of randomly selected input sequences. Reference set is a pair of key and value, where key is a fingerprint (or a unique id) and value is a sequence of characters. Next, these given sequences are compressed using referential compression algorithm. This is attained by matching the input with the reference and hence, replacing the match found in input by its fingerprints contained in the reference, thereby achieving better compression. The experimental results of this paper show that the approach proposed herein, outperforms the existing approaches and methodologies applied so far.
引用
收藏
页码:64 / 69
页数:6
相关论文
共 50 条
  • [1] ERGC: an efficient referential genome compression algorithm
    Saha, Subrata
    Rajasekaran, Sanguthevar
    BIOINFORMATICS, 2015, 31 (21) : 3468 - 3475
  • [2] NRGC: a novel referential genome compression algorithm
    Saha, Subrata
    Rajasekaran, Sanguthevar
    BIOINFORMATICS, 2016, 32 (22) : 3405 - 3412
  • [3] High efficiency referential genome compression algorithm
    Shi, Wei
    Chen, Jianhua
    Luo, Mao
    Chen, Min
    BIOINFORMATICS, 2019, 35 (12) : 2058 - 2065
  • [4] Referential DNA Data Compression using Hadoop Map Reduce Framework
    Bhukya, Raju
    Deshmuk, Sumit
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (02) : 207 - 214
  • [5] NRRC: A Non-referential Reads Compression Algorithm
    Saha, Subrata
    Rajasekaran, Sanguthevar
    BIOINFORMATICS RESEARCH AND APPLICATIONS (ISBRA 2015), 2015, 9096 : 297 - 308
  • [6] Comment on: 'ERGC: an efficient referential genome compression algorithm'
    Deorowicz, Sebastian
    Grabowski, Szymon
    Ochoa, Idoia
    Hernaez, Mikel
    Weissman, Tsachy
    BIOINFORMATICS, 2016, 32 (07) : 1115 - 1117
  • [7] On-Demand Indexing for Referential Compression of DNA Sequences
    Alves, Fernando
    Cogo, Vinicius
    Wandelt, Sebastian
    Leser, Ulf
    Bessani, Alysson
    PLOS ONE, 2015, 10 (07):
  • [8] A compression algorithm for DNA sequences
    Chen, X
    Kwong, S
    Li, M
    IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2001, 20 (04): : 61 - 66
  • [9] Authors' response to 'Comment on: ERGC: An efficient Referential Genome Compression Algorithm'
    Saha, Subrata
    Rajasekaran, Sanguthevar
    BIOINFORMATICS, 2016, 32 (07) : 1118 - 1119
  • [10] DNA Compression using an innovative Index based Coding Algorithm
    Zahra, Shan E.
    Masood, Khalid
    Asif, Muhammad
    2019 22ND IEEE INTERNATIONAL MULTI TOPIC CONFERENCE (INMIC), 2019, : 266 - 271