Improving protein fold recognition using triplet network and ensemble deep learning

被引:9
|
作者
Liu, Yan [1 ,2 ]
Han, Ke [1 ,2 ]
Zhu, Yi-Heng [1 ,2 ]
Zhang, Ying [1 ,2 ]
Shen, Long-Chen [1 ,2 ]
Song, Jiangning [3 ,4 ,5 ]
Yu, Dong-Jun [1 ,6 ,7 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, 200 Xiaolingwei, Nanjing 210094, Peoples R China
[2] Pattern Recognit & Bioinformat Grp, Delft, Netherlands
[3] Monash Univ, Biomed Discovery Inst, Melbourne, Vic 3800, Australia
[4] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[5] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic, Australia
[6] China Comp Federat, Beijing, Peoples R China
[7] China Assoc Artificial Intelligence, Beijing, Peoples R China
基金
英国医学研究理事会; 中国国家自然科学基金; 美国国家卫生研究院; 澳大利亚研究理事会;
关键词
protein fold recognition; bioinformatics; convolutional neural network; ensemble deep learning; triplet loss; HIDDEN MARKOV-MODELS; SEQUENCE; SECONDARY; CLASSIFICATION; DYNAMICS; INFORMATION; PREDICTION; SERVER;
D O I
10.1093/bib/bbab248
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been dramatically improved. Most DL-based methods take an intermediate bottleneck layer as the feature representation of proteins with new fold types. However, this strategy is indirect, inefficient and conditional on the hypothesis that the bottleneck layer's representation is assumed as a good representation of proteins with new fold types. To address the above problem, in this work, we develop a new computational framework by combining triplet network and ensemble DL. We first train a DL-based model, termed FoldNet, which employs triplet loss to train the deep convolutional network. FoldNet directly optimizes the protein fold embedding itself, making the proteins with the same fold types be closer to each other than those with different fold types in the new protein embedding space. Subsequently, using the trained FoldNet, we implement a new residue-residue contact-assisted predictor, termed FoldTR, which improves protein fold recognition. Furthermore, we propose a new ensemble DL method, termed FSD_XGBoost, which combines protein fold embedding with the other two discriminative fold-specific features extracted by two DL-based methods SSAfold and DeepFR. The Top 1 sensitivity of FSD_XGBoost increases to 74.8% at the fold level, which is similar to 9% higher than that of the state-of-the-art method. Together, the results suggest that fold-specific features extracted by different DL methods complement with each other, and their combination can further improve fold recognition at the fold level. The implemented web server of FoldTR and benchmark datasets are publicly available at http://csbio.njust.edu.cn/bioinf/foldtr/.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Improving Protein Fold Recognition by Deep Learning Networks
    Jo, Taeho
    Hou, Jie
    Eickholt, Jesse
    Cheng, Jianlin
    [J]. SCIENTIFIC REPORTS, 2015, 5
  • [2] Improving Protein Fold Recognition by Deep Learning Networks
    Taeho Jo
    Jie Hou
    Jesse Eickholt
    Jianlin Cheng
    [J]. Scientific Reports, 5
  • [3] Learning Protein Embedding to Improve Protein Fold Recognition Using Deep Metric Learning
    Zhu, Guan-Yu
    Liu, Yan
    Wang, Peng-Hao
    Yang, Xibei
    Yu, Dong-Jun
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (17) : 4283 - 4291
  • [4] Ensemble of classifiers for protein fold recognition
    Nanni, L
    [J]. NEUROCOMPUTING, 2006, 69 (7-9) : 850 - 853
  • [5] Deep Metric Learning Using Triplet Network
    Hoffer, Elad
    Ailon, Nir
    [J]. SIMILARITY-BASED PATTERN RECOGNITION, SIMBAD 2015, 2015, 9370 : 84 - 92
  • [6] Contextualized Multidimensional Personality Recognition using Combination of Deep Neural Network and Ensemble Learning
    Deilami, Fatemeh Mohades
    Sadr, Hossein
    Tarkhan, Morteza
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (05) : 3811 - 3828
  • [7] Contextualized Multidimensional Personality Recognition using Combination of Deep Neural Network and Ensemble Learning
    Fatemeh Mohades Deilami
    Hossein Sadr
    Morteza Tarkhan
    [J]. Neural Processing Letters, 2022, 54 : 3811 - 3828
  • [8] A novel ensemble of classifiers for protein fold recognition
    Nanni, Loris
    [J]. NEUROCOMPUTING, 2006, 69 (16-18) : 2434 - 2437
  • [9] Ensemble classifier for protein fold pattern recognition
    Shen, Hong-Bin
    Chou, Kuo-Chen
    [J]. BIOINFORMATICS, 2006, 22 (14) : 1717 - 1722
  • [10] Melanoma recognition using deep learning and ensemble of classifiers
    Gil, Fabian
    Osowski, Stanislaw
    Slowinska, Monika
    [J]. 2022 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL PROBLEMS OF ELECTRICAL ENGINEERING (CPEE), 2022,