An efficient similarity search based on indexing in large DNA databases

被引：7

作者：

Jeong, In-Seon ^{[1
]}

Park, Kyoung-Wook ^{[1
]}

Kang, Seung-Ho ^{[1
]}

Lim, Hyeong-Seok ^{[1
]}

机构：

[1] Chonnam Natl Univ, Sch Elect & Comp Eng, Kwangju 500757, South Korea

来源：

COMPUTATIONAL BIOLOGY AND CHEMISTRY | 2010年 / 34卷 / 02期

关键词：

Similarity search; Approximate string matching; Indexing; DNA sequence; HOMOLOGY SEARCH;

D O I：

10.1016/j.compbiolchem.2010.03.007

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. (C) 2010 Elsevier Ltd. All rights reserved.

引用

页码：131 / 136

页数：6

共 50 条

[1] Effective indexing and filtering for similarity search in large biosequence databases
Ozturk, O
Ferhatosmanoglu, H
THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 359 - 366
[2] An Efficient Document Indexing-Based Similarity Search in Large Datasets
Trong Nhan Phan
Jaeger, Markus
Nadschlaeger, Stefan
Kueng, Josef
Tran Khanh Dang
FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 16 - 31
[3] Indexing scheme for fast similarity search in large time series databases
Keogh, Eamonn J.
Pazzani, Michael J.
Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, 1999, : 56 - 67
[4] Efficient similarity search for hierarchical data in large databases
Kailing, K
Kriegel, HP
Schönauer, S
Seidl, T
ADVANCES IN DATABASE TECHNOLOGY - EDBT 2004, PROCEEDINGS, 2004, 2992 : 676 - 693
[5] An efficient bitmap indexing method for similarity search in high dimensional multimedia databases
Jeong, J
Nang, J
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 815 - 818
[6] Efficient Graph Similarity Search Over Large Graph Databases
Zheng, Weiguo
Zou, Lei
Lian, Xiang
Wang, Dong
Zhao, Dongyan
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (04) : 964 - 978
[7] Efficient similarity search in large databases of tree structured objects
Kailing, K
Kriegel, HP
Schönauer, S
Seidl, T
20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 835 - 835
[8] Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases
Yuan, Ye
Wang, Guoren
Chent, Lei
Wang, Haixun
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (09): : 800 - 811
[9] Piers: An efficient model for similarity search in DNA sequence databases
Cao, X
Li, SC
Ooi, BC
Tung, AKH
SIGMOD RECORD, 2004, 33 (02) : 39 - 44
[10] A fast heuristic algorithm for similarity search in large DNA databases
Jeong, In-Seon
Park, Kyoung-Wook
Lim, Hyeong-Seok
PROCEEDINGS OF THE FRONTIERS IN THE CONVERGENCE OF BIOSCIENCE AND INFORMATION TECHNOLOGIES, 2007, : 335 - 340

← 1 2 3 4 5 →