Bioinformatics features based DNA Sequence data compression algorithm

被引:0
|
作者
Ji, Zhen [1 ]
Zhou, Jia-Rui [2 ]
Zhu, Ze-Xuan [1 ]
Wu, Q.H. [3 ]
机构
[1] College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
[2] College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, Zhejiang 310027, China
[3] Department of Electrical Engineering and Electronics, The University of Liverpool, Liverpool, L69 3GJ, United Kingdom
来源
关键词
DNA sequences - Data compression - Markov processes - Clustering algorithms - DNA - Benchmarking;
D O I
暂无
中图分类号
学科分类号
摘要
A novel bioinformatics features based DNA Sequence data compression algorithm of BioLZMA is proposed in this paper. In BioLZMA, the DNA sequence data is sliced and reformed into 4 clusters according with biological meanings: the coding sequence cluster, the intron cluster, the RNA cluster and the residual cluster. By employing pointed compression strategies in data pre-processing, the clusters are compressed separately with LZMA algorithm. Experimental results demonstrated the better performance of BioLZMA than original DNA compression algorithms on benchmark sequences. Especially on long DNA sequence with significant bioinformatics features, BioLZMA algorithm can achieve higher compression ratio with little computation time.
引用
收藏
页码:991 / 995
相关论文
共 50 条
  • [1] Design and development of bioinformatics feature based DNA sequence data compression algorithm
    Banerjee K.
    Bali V.
    EAI Endorsed Transactions on Pervasive Health and Technology, 2020, 5 (20):
  • [2] DNA sequence data compression method based on Memetic Algorithm
    Tan, Li
    Sun, Ji-Feng
    Guo, Li-Hua
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2014, 36 (01): : 121 - 127
  • [3] Intelligent DNA sequence data compression using memetic algorithm
    College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, Zhejiang 310027, China
    不详
    不详
    Tien Tzu Hsueh Pao, 2013, 3 (513-518):
  • [4] Reference based Inter Chromosomal similarity based DNA sequence compression algorithm
    Banerjee, Kakoli
    Prasad, R. A.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 234 - 238
  • [5] A DNA sequence compression algorithm based on LUT and LZ77
    Bao, S
    Chen, S
    Jing, ZQ
    Ren, R
    2005 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Vols 1 and 2, 2005, : 23 - 28
  • [6] Development of Novel Data Compression Technique for Accelerate DNA Sequence Alignment Based on Smith-Waterman Algorithm
    Al Junid, S. A. M.
    Haron, M. A.
    Abd Majid, Z.
    Halim, A. K.
    Osman, F. N.
    Hashim, H.
    2009 THIRD UKSIM EUROPEAN SYMPOSIUM ON COMPUTER MODELING AND SIMULATION (EMS 2009), 2009, : 181 - 186
  • [7] FCompress: An Algorithm for FASTQ Sequence Data Compression
    Sardaraz, Muhammad
    Tahir, Muhammad
    CURRENT BIOINFORMATICS, 2019, 14 (02) : 123 - 129
  • [8] Algorithm for DNA Sequence Compression Based on Prediction of Mismatch Bases and Repeat Location
    Kaipa, Kalyan Kumar
    Bopardikar, Ajit S.
    Abhilash, Srikantha
    Venkataraman, Parthasarathy
    Lee, Kyusang
    Ahn, Taejin
    Narayanan, Rangavittal
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 851 - 852
  • [9] A hybrid particle swarm optimization based memetic algorithm for DNA sequence compression
    Li Tan
    Jifeng Sun
    Xueke Tong
    Soft Computing, 2015, 19 : 1255 - 1268
  • [10] A hybrid particle swarm optimization based memetic algorithm for DNA sequence compression
    Tan, Li
    Sun, Jifeng
    Tong, Xueke
    SOFT COMPUTING, 2015, 19 (05) : 1255 - 1268