Bioinformatics features based DNA Sequence data compression algorithm

被引:0
|
作者
Ji, Zhen [1 ]
Zhou, Jia-Rui [2 ]
Zhu, Ze-Xuan [1 ]
Wu, Q.H. [3 ]
机构
[1] College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
[2] College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, Zhejiang 310027, China
[3] Department of Electrical Engineering and Electronics, The University of Liverpool, Liverpool, L69 3GJ, United Kingdom
来源
关键词
DNA sequences - Data compression - Markov processes - Clustering algorithms - DNA - Benchmarking;
D O I
暂无
中图分类号
学科分类号
摘要
A novel bioinformatics features based DNA Sequence data compression algorithm of BioLZMA is proposed in this paper. In BioLZMA, the DNA sequence data is sliced and reformed into 4 clusters according with biological meanings: the coding sequence cluster, the intron cluster, the RNA cluster and the residual cluster. By employing pointed compression strategies in data pre-processing, the clusters are compressed separately with LZMA algorithm. Experimental results demonstrated the better performance of BioLZMA than original DNA compression algorithms on benchmark sequences. Especially on long DNA sequence with significant bioinformatics features, BioLZMA algorithm can achieve higher compression ratio with little computation time.
引用
收藏
页码:991 / 995
相关论文
共 50 条
  • [21] K-means Clustering Based Compression Algorithm for the High-throughput DNA Sequence
    Tan, Li
    Sun, Jifeng
    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 952 - 955
  • [22] DNA sequence compression
    Korodi, Gergely
    Tabus, Ioan
    Rissanen, Jorma
    Astola, Jaakko
    IEEE SIGNAL PROCESSING MAGAZINE, 2007, 24 (01) : 47 - 53
  • [23] Differential direct coding: a compression algorithm for nucleotide sequence data
    Vey, Gregory
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2009,
  • [24] Polynomial Based Representation for DNA Sequence Compression and Search
    Khan, Waqar Ahmad
    Khan, Aftab
    2020 IEEE PUNE SECTION INTERNATIONAL CONFERENCE (PUNECON), 2020, : 202 - 205
  • [25] Algorithm for point cloud compression based on geometrical features
    Qiao S.
    Zhang K.
    Gao K.
    International Journal of Performability Engineering, 2019, 15 (03) : 782 - 791
  • [26] Batch Images Compression Algorithm Based on the Common Features
    Wang, Zhiqiong
    Lin, Zhixiang
    Xu, Lining
    Zhao, Yue
    Xin, Junchang
    2017 10TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI), 2017,
  • [27] DNA sequence splicing algorithm based on Spark
    Pan, Xu
    Fu, Xue-liang
    Dong, Gai-fang
    Li, Hong-hui
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 52 - 56
  • [28] DNA SEQUENCE RECONSTRUCTION BASED ON GENETIC ALGORITHM
    Islam, Md. Rafiqul
    Shahriar, Md. Rowshan
    Shaheed, Abul Faisal Mohammad
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2008, 21 (01) : 13 - 23
  • [29] Seismic data compression based on EZW algorithm
    Xu, Fengtao
    Zhang, Zhengbing
    Gui, Zhixian
    Shiyou Diqiu Wuli Kantan/Oil Geophysical Prospecting, 2015, 50 (05): : 881 - 889
  • [30] BIND - An algorithm for loss-less compression of nucleotide sequence data
    Bose, Tungadri
    Mohammed, Monzoorul Haque
    Dutta, Anirban
    Mande, Sharmila S.
    JOURNAL OF BIOSCIENCES, 2012, 37 (04) : 785 - 789