A simple statistical algorithm for biological sequence compression

被引:0
|
作者
Cao, Minh Duc [1 ]
Dix, Trevor I. [1 ]
Allison, Lloyd [1 ]
Mears, Chris [1 ]
机构
[1] Monash Univ, Fac Informat Technol, Clayton, Vic 3168, Australia
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces a novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information sequence provides insight for further study of the biological sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein sequence datasets while maintaining a practical running time.
引用
收藏
页码:43 / +
页数:3
相关论文
共 50 条
  • [1] SeqCompress: An algorithm for biological sequence compression
    Sardaraz, Muhammad
    Tahir, Muhammad
    Ikram, Ataul Aziz
    Bajwa, Hassan
    GENOMICS, 2014, 104 (04) : 225 - 228
  • [2] Sequence Statistical Code Based Data Compression Algorithm for Wireless Sensor Network
    S. Jancy
    C. Jayakumar
    Wireless Personal Communications, 2019, 106 : 971 - 985
  • [3] Sequence Statistical Code Based Data Compression Algorithm for Wireless Sensor Network
    Jancy, S.
    Jayakumar, C.
    WIRELESS PERSONAL COMMUNICATIONS, 2019, 106 (03) : 971 - 985
  • [4] A Biological Sequence Compression Algorithm Based on Variable Length LUT and LZ 77
    Bharti, Rajendra Kumar
    Verma, Archana
    Singh, R. K.
    2010 INTERNATIONAL CONFERENCE ON NETWORKING AND INFORMATION TECHNOLOGY (ICNIT 2010), 2010, : 507 - 511
  • [5] Statistical significance in biological sequence analysis
    Mitrophanov, Alexander Yu.
    Borodovsky, Mark
    BRIEFINGS IN BIOINFORMATICS, 2006, 7 (01) : 2 - 24
  • [6] A lossless reference-free sequence compression algorithm leveraging grammatical, statistical, and substitution rules
    Roy, Subhankar
    Kumar Maity, Dilip
    Mukhopadhyay, Anirban
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2025, 24
  • [7] A simple algorithm of fractal image compression
    Qi, Li-Min
    Liu, Wen-Yao
    Yuan, Li
    Chen, Zhi-Hong
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2008, 41 (10): : 1152 - 1156
  • [8] A simple algorithm for the constrained sequence problems
    Chin, FYL
    De Santis, A
    Ferrara, AL
    Ho, NL
    Kim, SK
    INFORMATION PROCESSING LETTERS, 2004, 90 (04) : 175 - 179
  • [9] Statistical encoding algorithm for hierarchical image compression
    Gashnikov M.V.
    Optical Memory and Neural Networks, 2017, 26 (4) : 274 - 279
  • [10] FCompress: An Algorithm for FASTQ Sequence Data Compression
    Sardaraz, Muhammad
    Tahir, Muhammad
    CURRENT BIOINFORMATICS, 2019, 14 (02) : 123 - 129