Improving protein fold recognition using triplet network and ensemble deep learning

被引:9
|
作者
Liu, Yan [1 ,2 ]
Han, Ke [1 ,2 ]
Zhu, Yi-Heng [1 ,2 ]
Zhang, Ying [1 ,2 ]
Shen, Long-Chen [1 ,2 ]
Song, Jiangning [3 ,4 ,5 ]
Yu, Dong-Jun [1 ,6 ,7 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, 200 Xiaolingwei, Nanjing 210094, Peoples R China
[2] Pattern Recognit & Bioinformat Grp, Delft, Netherlands
[3] Monash Univ, Biomed Discovery Inst, Melbourne, Vic 3800, Australia
[4] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[5] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic, Australia
[6] China Comp Federat, Beijing, Peoples R China
[7] China Assoc Artificial Intelligence, Beijing, Peoples R China
基金
澳大利亚研究理事会; 英国医学研究理事会; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
protein fold recognition; bioinformatics; convolutional neural network; ensemble deep learning; triplet loss; HIDDEN MARKOV-MODELS; SEQUENCE; SECONDARY; CLASSIFICATION; DYNAMICS; INFORMATION; PREDICTION; SERVER;
D O I
10.1093/bib/bbab248
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been dramatically improved. Most DL-based methods take an intermediate bottleneck layer as the feature representation of proteins with new fold types. However, this strategy is indirect, inefficient and conditional on the hypothesis that the bottleneck layer's representation is assumed as a good representation of proteins with new fold types. To address the above problem, in this work, we develop a new computational framework by combining triplet network and ensemble DL. We first train a DL-based model, termed FoldNet, which employs triplet loss to train the deep convolutional network. FoldNet directly optimizes the protein fold embedding itself, making the proteins with the same fold types be closer to each other than those with different fold types in the new protein embedding space. Subsequently, using the trained FoldNet, we implement a new residue-residue contact-assisted predictor, termed FoldTR, which improves protein fold recognition. Furthermore, we propose a new ensemble DL method, termed FSD_XGBoost, which combines protein fold embedding with the other two discriminative fold-specific features extracted by two DL-based methods SSAfold and DeepFR. The Top 1 sensitivity of FSD_XGBoost increases to 74.8% at the fold level, which is similar to 9% higher than that of the state-of-the-art method. Together, the results suggest that fold-specific features extracted by different DL methods complement with each other, and their combination can further improve fold recognition at the fold level. The implemented web server of FoldTR and benchmark datasets are publicly available at http://csbio.njust.edu.cn/bioinf/foldtr/.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
  • [32] Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition
    Ibrahim, Wisam
    Abadeh, Mohammad Saniee
    JOURNAL OF THEORETICAL BIOLOGY, 2017, 421 : 1 - 15
  • [33] FoldHSphere: deep hyperspherical embeddings for protein fold recognition
    Villegas-Morcillo, Amelia
    Sanchez, Victoria
    Gomez, Angel M.
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [34] Protein Depth Calculation and the Use for Improving Accuracy of Protein Fold Recognition
    Xu, Dong
    Li, Hua
    Zhang, Yang
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (10) : 805 - 816
  • [35] Multi-feature output deep network ensemble learning for face recognition and verification
    Li, Chaorong
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 793 - 802
  • [36] FoldHSphere: deep hyperspherical embeddings for protein fold recognition
    Amelia Villegas-Morcillo
    Victoria Sanchez
    Angel M. Gomez
    BMC Bioinformatics, 22
  • [37] Multi-feature output deep network ensemble learning for face recognition and verification
    Chaorong Li
    Signal, Image and Video Processing, 2024, 18 : 793 - 802
  • [38] Protein Interaction Network Reconstruction Through Ensemble Deep Learning With Attention Mechanism
    Li, Feifei
    Zhu, Fei
    Ling, Xinghong
    Liu, Quan
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
  • [39] Texture Classification Using Deep Convolutional Neural Network with Ensemble Learning
    Gupta, Krishan
    Jain, Tushar
    Sengupta, Debarka
    MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION, MIKE 2018, 2018, 11308 : 341 - 350
  • [40] Protein Fold Recognition Using Self-Organizing Map Neural Network
    Polat, Ozlem
    Dokur, Zumray
    CURRENT BIOINFORMATICS, 2016, 11 (04) : 451 - 458