Fast motif search in protein sequence databases

被引:0
|
作者
Zheleva, Elena [1 ]
Arslan, Abdullah N.
机构
[1] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[2] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
关键词
regular expression matching; motif search; suffix tree; PROSITE pattern; heuristic; preprocessing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regular expression pattern matching is widely used in computational biology. Searching through a database of sequences for a motif (a simple regular expression) I or its variations is an important interactive process which requires fast motif-matching algorithms. In this paper, we explore and evaluate various represent at ions of the database of sequences using suffix trees for two types of query problems for a given regular expression: 1) Find the first match, and 2) Find all matches. Answering Problem I increases the level and effectiveness of interactive motif exploration. We propose a framework in which Problem I can be solved in a faster manner than existing solutions while not slowing down the solution of Problem 2. We apply several heuristics both at the level of suffix tree creation resulting in modified tree representations, and at the regular expression matching level in which we search subtrees in a given predefined order by simulating a deterministic finite automaton that we create from the given regular expression. The focus of our work is to develop a method for faster retrieval of PROSITE motif (a restricted regular expression) matches from a protein sequence database. We show empirically the effectiveness of our solution using several real protein data sets.
引用
收藏
页码:670 / 681
页数:12
相关论文
共 50 条
  • [41] Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics
    Deutsch, Eric W.
    Sun, Zhi
    Campbell, David S.
    Binz, Pierre-Alain
    Farrah, Terry
    Shteynberg, David
    Mendoza, Luis
    Omenn, Gilbert S.
    Moritz, Robert L.
    JOURNAL OF PROTEOME RESEARCH, 2016, 15 (11) : 4091 - 4100
  • [42] A fast search method of algebraic codebook by reordering search sequence
    Ha, NK
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 21 - 24
  • [43] Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases
    Wallqvist, A
    Fukunishi, Y
    Murphy, LR
    Fadel, A
    Levy, RM
    BIOINFORMATICS, 2000, 16 (11) : 988 - 1002
  • [44] Fast similarity search in databases of 3D objects
    Wang, X
    Wang, JTL
    TENTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1998, : 16 - 23
  • [45] A fast heuristic algorithm for similarity search in large DNA databases
    Jeong, In-Seon
    Park, Kyoung-Wook
    Lim, Hyeong-Seok
    PROCEEDINGS OF THE FRONTIERS IN THE CONVERGENCE OF BIOSCIENCE AND INFORMATION TECHNOLOGIES, 2007, : 335 - 340
  • [46] A fast descriptor matching algorithm for exhaustive search in large databases
    Song, BC
    Kim, MJ
    Ra, JB
    ADVANCES IN MUTLIMEDIA INFORMATION PROCESSING - PCM 2001, PROCEEDINGS, 2001, 2195 : 732 - 739
  • [47] Fast similarity search in three-dimensional structure databases
    Wang, X
    Wang, JTL
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (02): : 442 - 451
  • [48] Clustered sequence representation for fast homology search
    Cameron, Michael
    Bernstein, Yaniv
    Williams, Hugh E.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (05) : 594 - 614
  • [49] Motif-Based Search for a Novel Fructosyl Peptide Oxidase From Genome Databases
    Kim, Seungsu
    Ferri, Stefano
    Tsugawa, Wakako
    Mori, Kazushige
    Sode, Koji
    BIOTECHNOLOGY AND BIOENGINEERING, 2010, 106 (03) : 358 - 366
  • [50] Genomic Databases and the Search of Protein Targets for Protozoan Parasites
    Timmers, Luis Fernando S. M.
    Pauli, Ivani
    Barcellos, Guy Barros
    Rocha, Kelen Beiestorf
    Caceres, Rafael Andrade
    de Azevedo, Walter Filgueira, Jr.
    Pereira Soares, Milena Botelho
    CURRENT DRUG TARGETS, 2009, 10 (03) : 240 - 245