Bioinformatics features based DNA Sequence data compression algorithm

被引:0
|
作者
Ji, Zhen [1 ]
Zhou, Jia-Rui [2 ]
Zhu, Ze-Xuan [1 ]
Wu, Q.H. [3 ]
机构
[1] College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
[2] College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, Zhejiang 310027, China
[3] Department of Electrical Engineering and Electronics, The University of Liverpool, Liverpool, L69 3GJ, United Kingdom
来源
关键词
DNA sequences - Data compression - Markov processes - Clustering algorithms - DNA - Benchmarking;
D O I
暂无
中图分类号
学科分类号
摘要
A novel bioinformatics features based DNA Sequence data compression algorithm of BioLZMA is proposed in this paper. In BioLZMA, the DNA sequence data is sliced and reformed into 4 clusters according with biological meanings: the coding sequence cluster, the intron cluster, the RNA cluster and the residual cluster. By employing pointed compression strategies in data pre-processing, the clusters are compressed separately with LZMA algorithm. Experimental results demonstrated the better performance of BioLZMA than original DNA compression algorithms on benchmark sequences. Especially on long DNA sequence with significant bioinformatics features, BioLZMA algorithm can achieve higher compression ratio with little computation time.
引用
收藏
页码:991 / 995
相关论文
共 50 条
  • [31] ReCoil - an algorithm for compression of extremely large datasets of DNA data
    Yanovsky, Vladimir
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2011, 6
  • [32] BIND – An algorithm for loss-less compression of nucleotide sequence data
    Tungadri Bose
    Monzoorul Haque Mohammed
    Anirban Dutta
    Sharmila S Mande
    Journal of Biosciences, 2012, 37 : 785 - 789
  • [33] ReCoil - an algorithm for compression of extremely large datasets of dna data
    Vladimir Yanovsky
    Algorithms for Molecular Biology, 6
  • [34] Genome Sequence compression algorithm based on the Distributed source coding
    Shao, Jing-Jing
    PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS, ENVIRONMENT, BIOTECHNOLOGY AND COMPUTER (MMEBC), 2016, 88 : 1795 - 1799
  • [35] Data Compression Concepts and Algorithms and Their Applications to Bioinformatics
    Nalbantoglu, Oezkan U.
    Russell, David J.
    Sayood, Khalid
    ENTROPY, 2010, 12 (01) : 34 - 52
  • [36] DNA sequence based data classification technique
    Subhash Chandra Pandey
    Saket Kumar Singh
    CSI Transactions on ICT, 2015, 3 (1) : 59 - 69
  • [37] Encryption Algorithm Based on DNA Strand Displacement and DNA Sequence Operation
    Zou, Chengye
    Wei, Xiaopeng
    Zhang, Qiang
    Zhou, Changjun
    Zhou, Shuang
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2021, 20 (02) : 223 - 234
  • [38] DNA Sequence Alignment Based on Ants' Colony Algorithm
    Wu, Tianyu
    2ND INTERNATIONAL CONFERENCE ON FRONTIERS OF BIOLOGICAL SCIENCES AND ENGINEERING (FSBE 2019), 2020, 2208
  • [39] DNA sequence classification based on MLP with PILAE algorithm
    Mahmoud, Mohammed A. B.
    Guo, Ping
    SOFT COMPUTING, 2021, 25 (05) : 4003 - 4014
  • [40] Design of DNA sequence based on improved genetic algorithm
    Wang, Bin
    Zhang, Qiang
    Zhang, Rui
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2008, 5226 : 9 - +