An efficient similarity search based on indexing in large DNA databases

被引:7
|
作者
Jeong, In-Seon [1 ]
Park, Kyoung-Wook [1 ]
Kang, Seung-Ho [1 ]
Lim, Hyeong-Seok [1 ]
机构
[1] Chonnam Natl Univ, Sch Elect & Comp Eng, Kwangju 500757, South Korea
关键词
Similarity search; Approximate string matching; Indexing; DNA sequence; HOMOLOGY SEARCH;
D O I
10.1016/j.compbiolchem.2010.03.007
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Index-based search algorithms are an important part of a genomic search, and how to construct indices is the key to an index-based search algorithm to compute similarities between two DNA sequences. In this paper, we propose an efficient query processing method that uses special transformations to construct an index. It uses small storage and it rapidly finds the similarity between two sequences in a DNA sequence database. At first, a sequence is partitioned into equal length windows. We select the likely subsequences by computing Hamming distance to query sequence. The algorithm then transforms the subsequences in each window into a multidimensional vector space by indexing the frequencies of the characters, including the positional information of the characters in the subsequences. The result of our experiments shows that the algorithm has faster run time than other heuristic algorithms based on index structure. Also, the algorithm is as accurate as those heuristic algorithms. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:131 / 136
页数:6
相关论文
共 50 条
  • [21] Similarity Search in Graph Databases: A Multi-layered Indexing Approach
    Liang, Yongjiang
    Zhao, Peixiang
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 783 - 794
  • [22] Efficient Similarity Search by Combining Indexing and Caching Strategies
    Brisaboa, Nieves R.
    Cerdeira-Pena, Ana
    Gil-Costa, Veronica
    Marin, Mauricio
    Pedreira, Oscar
    SOFSEM 2015: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2015, 8939 : 486 - 497
  • [23] A cell-based high-dimensional indexing scheme for similarity search in multimedia databases
    Chang, JW
    Kim, YC
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS: INFORMATION SYSTEMS DEVELOPMENT I, 2002, : 51 - 56
  • [24] Efficiently Indexing Large Sparse Graphs for Similarity Search
    Wang, Guoren
    Wang, Bin
    Yang, Xiaochun
    Yu, Ge
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 440 - 451
  • [25] Indexing large metric spaces for similarity search queries
    Bozkaya, T
    Ozsoyoglu, M
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 1999, 24 (03): : 361 - 404
  • [26] Efficient Bitmap-based Indexing and Retrieval of Similarity Search Image Queries
    Jafari, Omid
    Nagarkar, Parth
    Montano, Jonathan
    2020 IEEE SOUTHWEST SYMPOSIUM ON IMAGE ANALYSIS AND INTERPRETATION (SSIAI 2020), 2020, : 58 - 61
  • [27] Efficient similarity search using the Earth Mover's Distance for large multimedia databases
    Assent, Ira
    Wichterich, Marc
    Meisen, Tobias
    Seidl, Thomas
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 307 - 316
  • [28] Stratified Graph Indexing for efficient search in deep descriptor databases
    Rahman, M. M. Mahabubur
    Tesic, Jelena
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)
  • [29] Fast search in DNA sequence databases using punctuation and indexing
    Lu, Yi
    Lu, Shiyong
    Ram, Jeffrey L.
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER SCIENCE AND TECHNOLOGY, 2006, : 351 - +
  • [30] Efficient Similarity Search in Scientific Databases with Feature Signatures
    Uysal, Merih Seran
    Beecks, Christian
    Schmuecking, Jochen
    Seidl, Thomas
    PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2015,