An efficient similarity search based on indexing in large DNA databases

被引:7
|
作者
Jeong, In-Seon [1 ]
Park, Kyoung-Wook [1 ]
Kang, Seung-Ho [1 ]
Lim, Hyeong-Seok [1 ]
机构
[1] Chonnam Natl Univ, Sch Elect & Comp Eng, Kwangju 500757, South Korea
关键词
Similarity search; Approximate string matching; Indexing; DNA sequence; HOMOLOGY SEARCH;
D O I
10.1016/j.compbiolchem.2010.03.007
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:131 / 136
页数:6
相关论文
共 50 条
  • [1] Effective indexing and filtering for similarity search in large biosequence databases
    Ozturk, O
    Ferhatosmanoglu, H
    THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 359 - 366
  • [2] An Efficient Document Indexing-Based Similarity Search in Large Datasets
    Trong Nhan Phan
    Jaeger, Markus
    Nadschlaeger, Stefan
    Kueng, Josef
    Tran Khanh Dang
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 16 - 31
  • [3] Indexing scheme for fast similarity search in large time series databases
    Keogh, Eamonn J.
    Pazzani, Michael J.
    Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, 1999, : 56 - 67
  • [4] Efficient similarity search for hierarchical data in large databases
    Kailing, K
    Kriegel, HP
    Schönauer, S
    Seidl, T
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2004, PROCEEDINGS, 2004, 2992 : 676 - 693
  • [5] An efficient bitmap indexing method for similarity search in high dimensional multimedia databases
    Jeong, J
    Nang, J
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 815 - 818
  • [6] Efficient Graph Similarity Search Over Large Graph Databases
    Zheng, Weiguo
    Zou, Lei
    Lian, Xiang
    Wang, Dong
    Zhao, Dongyan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (04) : 964 - 978
  • [7] Efficient similarity search in large databases of tree structured objects
    Kailing, K
    Kriegel, HP
    Schönauer, S
    Seidl, T
    20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 835 - 835
  • [8] Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases
    Yuan, Ye
    Wang, Guoren
    Chent, Lei
    Wang, Haixun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (09): : 800 - 811
  • [9] Piers: An efficient model for similarity search in DNA sequence databases
    Cao, X
    Li, SC
    Ooi, BC
    Tung, AKH
    SIGMOD RECORD, 2004, 33 (02) : 39 - 44
  • [10] A fast heuristic algorithm for similarity search in large DNA databases
    Jeong, In-Seon
    Park, Kyoung-Wook
    Lim, Hyeong-Seok
    PROCEEDINGS OF THE FRONTIERS IN THE CONVERGENCE OF BIOSCIENCE AND INFORMATION TECHNOLOGIES, 2007, : 335 - 340