A fast heuristic algorithm for similarity search in large DNA databases

被引:1
|
作者
Jeong, In-Seon [1 ]
Park, Kyoung-Wook [1 ]
Lim, Hyeong-Seok [1 ]
机构
[1] Chonnam Natl Univ, Dept Comp Sci, Kwangju, South Korea
关键词
D O I
10.1109/FBIT.2007.131
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The similarity search is an important procedure in genomic research. Considering enormous DNA sequence databases, it becomes impractical to use basic pattern matching methods such as the Smith-Waterman algorithm to compute similarities between two sequences. In this paper, we propose an efficient query processing method based on indexing. It uses little storage and rapidly finds the similarity between two sequences in the DNA sequence database. At first, our algorithm partitions sequences into equal length windows. And then it transforms subsequences in each window into a multidimensional vector space by indexing frequencies of characters, which includes the positional information of characters in subsequences, as weight. Our algorithm not only has linear time complexity but enhances the accuracy of query processing. The result of experiments shows that our algorithm performs query processing more accurately and has a lower error ratio than the well-known BLAST and heuristic algorithms which use only frequencies of characters.
引用
收藏
页码:335 / 340
页数:6
相关论文
共 50 条
  • [1] An efficient similarity search based on indexing in large DNA databases
    Jeong, In-Seon
    Park, Kyoung-Wook
    Kang, Seung-Ho
    Lim, Hyeong-Seok
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (02) : 131 - 136
  • [2] SSAHA: A fast search method for large DNA databases
    Ning, ZM
    Cox, AJ
    Mullikin, JC
    [J]. GENOME RESEARCH, 2001, 11 (10) : 1725 - 1729
  • [3] Fast similarity search in string databases
    Sheu, S
    Chang, A
    Huang, W
    [J]. 19TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS: AINA 2005, 2005, : 617 - 622
  • [4] Indexing scheme for fast similarity search in large time series databases
    Keogh, Eamonn J.
    Pazzani, Michael J.
    [J]. Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, 1999, : 56 - 67
  • [5] Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases
    Eamonn Keogh
    Kaushik Chakrabarti
    Michael Pazzani
    Sharad Mehrotra
    [J]. Knowledge and Information Systems, 2001, 3 (3) : 263 - 286
  • [6] A fast descriptor matching algorithm for exhaustive search in large databases
    Song, BC
    Kim, MJ
    Ra, JB
    [J]. ADVANCES IN MUTLIMEDIA INFORMATION PROCESSING - PCM 2001, PROCEEDINGS, 2001, 2195 : 732 - 739
  • [7] Image Similarity Search in Large Databases Using a Fast Machine Learning Approach
    Sinjur, Smiljan
    Zazula, Damjan
    [J]. NEW DIRECTIONS IN INTELLIGENT INTERACTIVE MULTIMEDIA, 2008, 142 : 85 - 93
  • [8] SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints
    Tabei, Yasuo
    Tsuda, Koji
    [J]. MOLECULAR INFORMATICS, 2011, 30 (09) : 801 - 807
  • [9] Toward a phylogenetically aware algorithm for fast DNA similarity search
    Buhler, J
    Nordgren, R
    [J]. COMPARATIVE GENOMICS, 2005, 3388 : 15 - 29
  • [10] Adaptable similarity search in large image databases
    Seidl, T
    Kriegel, HP
    [J]. STATE-OF-THE-ART IN CONTENT-BASED IMAGE AND VIDEO RETRIEVAL, 2001, 22 : 297 - 317