A machine learning information retrieval approach to protein fold recognition

被引:151
|
作者
Cheng, Jianlin [1 ]
Baldi, Pierre [1 ]
机构
[1] Univ Calif Irvine, Sch Informat & Comp Sci, Inst Genom & Bioinformat, Irvine, CA 92697 USA
关键词
D O I
10.1093/bioinformatics/btl102
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequence-structure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition-finding a proper template for a given query protein. Results: Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable and effective. Compared with 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is similar to 85, 56, and 27% at the family, superfamily and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90, 70, and 48%.
引用
收藏
页码:1456 / 1463
页数:8
相关论文
共 50 条
  • [1] Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms
    Qin, Xinyi
    Liu, Min
    Zhang, Lu
    Liu, Guangzhong
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2021, 91
  • [2] An Overview on Protein Fold Classification via Machine Learning Approach
    Tian, Xiaoyu
    Chen, Daozheng
    Gao, Jun
    [J]. CURRENT PROTEOMICS, 2018, 15 (02) : 85 - 98
  • [3] Machine Learning for Information Retrieval
    Si, Luo
    Jin, Rong
    [J]. PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 1293 - 1293
  • [4] Recent Progress in Machine Learning-Based Methods for Protein Fold Recognition
    Wei, Leyi
    Zou, Quan
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2016, 17 (12)
  • [5] Recent Trends in Machine Learning-based Protein Fold Recognition Methods
    Mehta, Apurva
    Mazumdar, Himanshu
    [J]. BIOINTERFACE RESEARCH IN APPLIED CHEMISTRY, 2021, 11 (04): : 11233 - 11243
  • [6] Cooperative approach for the protein fold recognition
    Ota, M
    Kawabata, T
    Kinjo, AR
    Nishikawa, K
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 1999, : 126 - 132
  • [7] A NEW APPROACH TO PROTEIN FOLD RECOGNITION
    JONES, DT
    TAYLOR, WR
    THORNTON, JM
    [J]. NATURE, 1992, 358 (6381) : 86 - 89
  • [8] Applications of machine learning in information retrieval
    Cunningham, SJ
    Witten, IH
    Littin, J
    [J]. ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1999, 34 : 341 - 384
  • [9] Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition
    Ibrahim, Wisam
    Abadeh, Mohammad Saniee
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2017, 421 : 1 - 15
  • [10] Improving Exploratory Information Retrieval for Neophytes: Machine Learning Approach with Feature Analysis
    Audeh, Bissan
    Beigbeder, Michel
    Largeron, Christine
    Ramirez-Cifuentes, Diana
    [J]. APPLIED COMPUTING REVIEW, 2020, 20 (04): : 50 - 64