Two Novel Techniques for Space Compaction on Biological Sequences

被引:1
|
作者
Volis, George [1 ]
Makris, Christos [1 ]
Kanavos, Andreas [1 ]
机构
[1] Univ Patras, Dept Comp Engn & Informat, Patras 26504, Greece
关键词
Searching and Browsing; Web Information Filtering and Retrieval; Text Mining; Indexing Structures; Inverted Files; Index Compression-Gram Indexing; Sequence Analysis and Assembly; COMPUTATION;
D O I
10.5220/0005801101050112
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The number and size of genomic databases have grown rapidly the last years. Consequently, the number of Internet-accessible databases has been rapidly growing. Therefore there is a need for satisfactory methods for managing this growing information. A lot of effort has been put to this direction. Contributing to this effort this paper presents two algorithms which can eliminate the amount of space for storing genomic information. Our first algorithm is based on the classic n-grams/2L technique for indexing a DNA sequence and it can convert the Inverted Index of this classic algorithm to a more compressed format. Researchers have revealed the existence of repeated and palindrome patterns in DNA of living organisms. The main motivation of this technique is based on this remark and proposes an alternative data structure for handling these sequences. Our experimental results show that our algorithm can achieve a more efficient index than the n-grams/2L algorithm and can be adapted by any algorithm that is based to n-grams/2L The second algorithm is based on the n-grams technique. Perceiving the four symbols of DNA alphabet as vertex of a square scheme imprint a DNA sequence as a relation between vertices, sides and diagonals of a square. The experimental results shows that this second idea succeed even more successfully compression of our index structure.
引用
收藏
页码:105 / 112
页数:8
相关论文
共 50 条
  • [1] New static compaction techniques of Test Sequences for sequential circuits
    Corno, F
    Prinetto, P
    Rebaudengo, M
    Reorda, MS
    EUROPEAN DESIGN & TEST CONFERENCE - ED&TC 97, PROCEEDINGS, 1997, : 37 - 43
  • [2] A Novel Method of Characterizing Genetic Sequences: Genome Space with Biological Distance and Applications
    Deng, Mo
    Yu, Chenglong
    Liang, Qian
    He, Rong L.
    Yau, Stephen S. -T.
    PLOS ONE, 2011, 6 (03):
  • [3] Novel compaction techniques with pellet-containing granules
    Pan, Xin
    Chen, Meiwan
    Han, Ke
    Peng, Xinsheng
    Wen, Xinguo
    Chen, Bao
    Wang, Jin
    Li, Ge
    Wu, Chuanbin
    EUROPEAN JOURNAL OF PHARMACEUTICS AND BIOPHARMACEUTICS, 2010, 75 (03) : 436 - 442
  • [4] CLUSTER VISUALIZATION AND NONLINEAR PROJECTION TECHNIQUES FOR BIOLOGICAL SEQUENCES
    Ferles, C.
    Stafylopatis, A.
    NEURAL NETWORK WORLD, 2016, 26 (03) : 289 - 303
  • [5] Two-Dimensional Static Test Compaction for Functional Test Sequences
    Pomeranz, Irith
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (10) : 3009 - 3015
  • [6] Sequences and topology: the completeness of biological space - Editorial overview
    Tramontano, Anna
    Pearson, William R.
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2007, 17 (03) : 334 - 336
  • [7] Equivalence of two Fourier methods for biological sequences
    Eivind Coward
    Journal of Mathematical Biology, 1997, 36 : 64 - 70
  • [8] Equivalence of two Fourier methods for biological sequences
    Coward, E
    JOURNAL OF MATHEMATICAL BIOLOGY, 1997, 36 (01) : 64 - 70
  • [9] A Novel Method to Analyze the Similarity of Biological Sequences
    Huang, Wei
    Guo, Ying
    Zhang, Jianmin
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2009, 26 (05): : 599 - 608
  • [10] Novel techniques for visualising biological information
    Robinson, AJ
    Flores, TP
    ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, 1997, : 241 - 249