Two Novel Techniques for Space Compaction on Biological Sequences

被引:1
|
作者
Volis, George [1 ]
Makris, Christos [1 ]
Kanavos, Andreas [1 ]
机构
[1] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece
关键词
Searching and Browsing; Web Information Filtering and Retrieval; Text Mining; Indexing Structures; Inverted Files; Index Compression-Gram Indexing; Sequence Analysis and Assembly; COMPUTATION;
D O I
10.5220/0005801101050112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The number and size of genomic databases have grown rapidly the last years. Consequently, the number of Internet-accessible databases has been rapidly growing. Therefore there is a need for satisfactory methods for managing this growing information. A lot of effort has been put to this direction. Contributing to this effort this paper presents two algorithms which can eliminate the amount of space for storing genomic information. Our first algorithm is based on the classic n-grams/2L technique for indexing a DNA sequence and it can convert the Inverted Index of this classic algorithm to a more compressed format. Researchers have revealed the existence of repeated and palindrome patterns in DNA of living organisms. The main motivation of this technique is based on this remark and proposes an alternative data structure for handling these sequences. Our experimental results show that our algorithm can achieve a more efficient index than the n-grams/2L algorithm and can be adapted by any algorithm that is based to n-grams/2L The second algorithm is based on the n-grams technique. Perceiving the four symbols of DNA alphabet as vertex of a square scheme imprint a DNA sequence as a relation between vertices, sides and diagonals of a square. The experimental results shows that this second idea succeed even more successfully compression of our index structure.
引用
收藏
页码:105 / 112
页数:8
相关论文
共 50 条
  • [31] Compiler techniques for code compaction
    Debray, SK
    Evans, W
    Muth, R
    De Sutter, B
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2000, 22 (02): : 378 - 415
  • [32] A COMPARISON OF FOUNDATION COMPACTION TECHNIQUES
    SOLYMAR, ZV
    REED, DJ
    CANADIAN GEOTECHNICAL JOURNAL, 1986, 23 (03) : 271 - 280
  • [33] BIOLOGICAL EFFECTS OF SOIL COMPACTION
    WHALLEY, WR
    DUMITRU, E
    DEXTER, AR
    SOIL & TILLAGE RESEARCH, 1995, 35 (1-2): : 53 - 68
  • [34] SOIL COMPACTION - DEFINITION AND TECHNIQUES
    ABEELS, P
    DECLERCQ, D
    REVUE DE L AGRICULTURE, 1977, 30 (01): : 131 - 150
  • [35] LOCAL MICROCODE COMPACTION TECHNIQUES
    LANDSKOV, D
    DAVIDSON, S
    SHRIVER, B
    MALLETT, PW
    COMPUTING SURVEYS, 1980, 12 (03) : 261 - 294
  • [36] Compaction techniques for nextword indexes
    Bahle, D
    Williams, HE
    Zobel, J
    EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, : 33 - 45
  • [37] Software techniques for program compaction
    De Sutter, B
    De Bosschere, K
    COMMUNICATIONS OF THE ACM, 2003, 46 (08) : 33 - 34
  • [38] A data mining approach based on machine learning techniques to classify biological sequences
    Maddouri, M
    Elloumi, M
    KNOWLEDGE-BASED SYSTEMS, 2002, 15 (04) : 217 - 223
  • [39] Novel Techniques as a Tool to Investigate on the Triggering of Biological Targets
    Locatelli, Marcello
    CURRENT DRUG TARGETS, 2013, 14 (09) : 937 - 937
  • [40] Novel Time Synchronization Techniques for Deep Space Probes
    Re, E.
    Di Cintio, A.
    Giunta, D.
    Busca, G.
    Sanchez, M.
    2009 JOINT MEETING OF THE EUROPEAN FREQUENCY AND TIME FORUM AND THE IEEE INTERNATIONAL FREQUENCY CONTROL SYMPOSIUM, VOLS 1 AND 2, 2009, : 205 - +