Accelerating Information Retrieval from Profile Hidden Markov Model Databases

被引:0
|
作者
Tamimi, Ahmad [1 ]
Ashhab, Yaqoub [2 ]
Tamimi, Hashem [1 ,2 ]
机构
[1] Palestine Polytech Univ, Coll Informat Technol & Comp Engn, Hebron, Palau
[2] Palestine Polytech Univ, Palestine Korea Biotechnol Ctr, Hebron, Palau
来源
PLOS ONE | 2016年 / 11卷 / 11期
关键词
SEQUENCE; PROTEIN; ALIGNMENT; ALGORITHM; FAMILIES; PROGRAM;
D O I
10.1371/journal.pone.0166358
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Profile Hidden Markov Model (Profile-HMM) is an efficient statistical approach to represent protein families. Currently, several databases maintain valuable protein sequence information as profile-HMMs. There is an increasing interest to improve the efficiency of searching Profile-HMM databases to detect sequence-profile or profile-profile homology. However, most efforts to enhance searching efficiency have been focusing on improving the alignment algorithms. Although the performance of these algorithms is fairly acceptable, the growing size of these databases, as well as the increasing demand for using batch query searching approach, are strong motivations that call for further enhancement of information retrieval from profile-HMM databases. This work presents a heuristic method to accelerate the current profile-HMM homology searching approaches. The method works by cluster-based remodeling of the database to reduce the search space, rather than focusing on the alignment algorithms. Using different clustering techniques, 4284 TIGRFAMs profiles were clustered based on their similarities. A representative for each cluster was assigned. To enhance sensitivity, we proposed an extended step that allows overlapping among clusters. A validation benchmark of 6000 randomly selected protein sequences was used to query the clustered profiles. To evaluate the efficiency of our approach, speed and recall values were measured and compared with the sequential search approach. Using hierarchical, k-means, and connected component clustering techniques followed by the extended overlapping step, we obtained an average reduction in time of 41%, and an average recall of 96%. Our results demonstrate that representation of profile-HMMs using a clustering-based approach can significantly accelerate data retrieval from profile-HMM databases.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Text databases and information retrieval
    [J]. ACM Comput Surv, 1 (133):
  • [42] Computation of mutual information from Hidden Markov Models
    Reker, Daniel
    Katzenbeisser, Stefan
    Hamacher, Kay
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (5-6) : 328 - 333
  • [43] A hidden Markov model-based approach for extracting information from web news
    Tso, Brandt
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2007, 3 (1-2) : 104 - 115
  • [44] Information extraction incorporating paragraph feature and hidden Markov Model
    Na, Liu
    Lu, Mingyu
    Tang, Huanling
    [J]. 2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 953 - 956
  • [45] Web information extraction using generalized hidden Markov model
    Zhong, Ping
    Chen, Jinlin
    Cook, Terry
    [J]. 2006 1ST IEEE WORKSHOP ON HOT TOPICS IN WEB SYSTEMS AND TECHNOLOGIES, 2006, : 142 - +
  • [46] A generalized hidden Markov model approach for web information extraction
    Zhong, Ping
    Chen, Jinlin
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 709 - +
  • [47] HIDDEN MARKOV MODEL ANALYSIS OF FORCE TORQUE INFORMATION IN TELEMANIPULATION
    HANNAFORD, B
    LEE, P
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 1991, 10 (05): : 528 - 539
  • [48] Web information extraction based on a Generalized Hidden Markov Model
    Yao, Yong
    Wang, Jing
    Liu, Zhijing
    [J]. Journal of Computational Information Systems, 2007, 3 (05): : 1847 - 1854
  • [49] Medical Risk Information Extraction Based on Hidden Markov Model
    Yu, Xin
    Zhang, Ju
    [J]. 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 778 - 782
  • [50] Information geometry approach to parameter estimation in hidden Markov model
    Hayashi, Masahito
    [J]. BERNOULLI, 2022, 28 (01) : 307 - 342