A fast heuristic algorithm for similarity search in large DNA databases

被引：1

作者：

Jeong, In-Seon ^{[1
]}

Park, Kyoung-Wook ^{[1
]}

Lim, Hyeong-Seok ^{[1
]}

机构：

[1] Chonnam Natl Univ, Dept Comp Sci, Kwangju, South Korea

来源：

PROCEEDINGS OF THE FRONTIERS IN THE CONVERGENCE OF BIOSCIENCE AND INFORMATION TECHNOLOGIES | 2007年

关键词：

D O I：

10.1109/FBIT.2007.131

中图分类号：

Q81 [生物工程学（生物技术）]; Q93 [微生物学];

学科分类号：

071005 ; 0836 ; 090102 ; 100705 ;

摘要：

The similarity search is an important procedure in genomic research. Considering enormous DNA sequence databases, it becomes impractical to use basic pattern matching methods such as the Smith-Waterman algorithm to compute similarities between two sequences. In this paper, we propose an efficient query processing method based on indexing. It uses little storage and rapidly finds the similarity between two sequences in the DNA sequence database. At first, our algorithm partitions sequences into equal length windows. And then it transforms subsequences in each window into a multidimensional vector space by indexing frequencies of characters, which includes the positional information of characters in subsequences, as weight. Our algorithm not only has linear time complexity but enhances the accuracy of query processing. The result of experiments shows that our algorithm performs query processing more accurately and has a lower error ratio than the well-known BLAST and heuristic algorithms which use only frequencies of characters.

引用

页码：335 / 340

页数：6

共 50 条

[1] An efficient similarity search based on indexing in large DNA databases
Jeong, In-Seon
Park, Kyoung-Wook
Kang, Seung-Ho
Lim, Hyeong-Seok
[J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (02) : 131 - 136
[2] SSAHA: A fast search method for large DNA databases
Ning, ZM
Cox, AJ
Mullikin, JC
[J]. GENOME RESEARCH, 2001, 11 (10) : 1725 - 1729
[3] Fast similarity search in string databases
Sheu, S
Chang, A
Huang, W
[J]. 19TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS: AINA 2005, 2005, : 617 - 622
[4] Indexing scheme for fast similarity search in large time series databases
Keogh, Eamonn J.
Pazzani, Michael J.
[J]. Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, 1999, : 56 - 67
[5] Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases
Eamonn Keogh
Kaushik Chakrabarti
Michael Pazzani
Sharad Mehrotra
[J]. Knowledge and Information Systems, 2001, 3 (3) : 263 - 286
[6] A fast descriptor matching algorithm for exhaustive search in large databases
Song, BC
Kim, MJ
Ra, JB
[J]. ADVANCES IN MUTLIMEDIA INFORMATION PROCESSING - PCM 2001, PROCEEDINGS, 2001, 2195 : 732 - 739
[7] Image Similarity Search in Large Databases Using a Fast Machine Learning Approach
Sinjur, Smiljan
Zazula, Damjan
[J]. NEW DIRECTIONS IN INTELLIGENT INTERACTIVE MULTIMEDIA, 2008, 142 : 85 - 93
[8] SketchSort: Fast All Pairs Similarity Search for Large Databases of Molecular Fingerprints
Tabei, Yasuo
Tsuda, Koji
[J]. MOLECULAR INFORMATICS, 2011, 30 (09) : 801 - 807
[9] Toward a phylogenetically aware algorithm for fast DNA similarity search
Buhler, J
Nordgren, R
[J]. COMPARATIVE GENOMICS, 2005, 3388 : 15 - 29
[10] Adaptable similarity search in large image databases
Seidl, T
Kriegel, HP
[J]. STATE-OF-THE-ART IN CONTENT-BASED IMAGE AND VIDEO RETRIEVAL, 2001, 22 : 297 - 317

← 1 2 3 4 5 →