Improving protein fold recognition using triplet network and ensemble deep learning

被引：9

作者：

Liu, Yan ^{[1
,2
]}

Han, Ke ^{[1
,2
]}

Zhu, Yi-Heng ^{[1
,2
]}

Zhang, Ying ^{[1
,2
]}

Shen, Long-Chen ^{[1
,2
]}

Song, Jiangning ^{[3
,4
,5
]}

Yu, Dong-Jun ^{[1
,6
,7
]}

机构：

[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, 200 Xiaolingwei, Nanjing 210094, Peoples R China

[2] Pattern Recognit & Bioinformat Grp, Delft, Netherlands

[3] Monash Univ, Biomed Discovery Inst, Melbourne, Vic 3800, Australia

[4] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia

[5] Monash Univ, Fac Informat Technol, Monash Ctr Data Sci, Melbourne, Vic, Australia

[6] China Comp Federat, Beijing, Peoples R China

[7] China Assoc Artificial Intelligence, Beijing, Peoples R China

来源：

BRIEFINGS IN BIOINFORMATICS | 2021年 / 22卷 / 06期

基金：

澳大利亚研究理事会; 英国医学研究理事会; 中国国家自然科学基金; 美国国家卫生研究院;

关键词：

protein fold recognition; bioinformatics; convolutional neural network; ensemble deep learning; triplet loss; HIDDEN MARKOV-MODELS; SEQUENCE; SECONDARY; CLASSIFICATION; DYNAMICS; INFORMATION; PREDICTION; SERVER;

D O I：

10.1093/bib/bbab248

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been dramatically improved. Most DL-based methods take an intermediate bottleneck layer as the feature representation of proteins with new fold types. However, this strategy is indirect, inefficient and conditional on the hypothesis that the bottleneck layer's representation is assumed as a good representation of proteins with new fold types. To address the above problem, in this work, we develop a new computational framework by combining triplet network and ensemble DL. We first train a DL-based model, termed FoldNet, which employs triplet loss to train the deep convolutional network. FoldNet directly optimizes the protein fold embedding itself, making the proteins with the same fold types be closer to each other than those with different fold types in the new protein embedding space. Subsequently, using the trained FoldNet, we implement a new residue-residue contact-assisted predictor, termed FoldTR, which improves protein fold recognition. Furthermore, we propose a new ensemble DL method, termed FSD_XGBoost, which combines protein fold embedding with the other two discriminative fold-specific features extracted by two DL-based methods SSAfold and DeepFR. The Top 1 sensitivity of FSD_XGBoost increases to 74.8% at the fold level, which is similar to 9% higher than that of the state-of-the-art method. Together, the results suggest that fold-specific features extracted by different DL methods complement with each other, and their combination can further improve fold recognition at the fold level. The implemented web server of FoldTR and benchmark datasets are publicly available at http://csbio.njust.edu.cn/bioinf/foldtr/.

引用

页数：16

共 50 条

[31] Emotion Recognition on Multimodal with Deep Learning and Ensemble
Dharma, David Adi
Zahra, Amalia
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
[32] Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition
Ibrahim, Wisam
Abadeh, Mohammad Saniee
JOURNAL OF THEORETICAL BIOLOGY, 2017, 421 : 1 - 15
[33] FoldHSphere: deep hyperspherical embeddings for protein fold recognition
Villegas-Morcillo, Amelia
Sanchez, Victoria
Gomez, Angel M.
BMC BIOINFORMATICS, 2021, 22 (01)
[34] Protein Depth Calculation and the Use for Improving Accuracy of Protein Fold Recognition
Xu, Dong
Li, Hua
Zhang, Yang
JOURNAL OF COMPUTATIONAL BIOLOGY, 2013, 20 (10) : 805 - 816
[35] Multi-feature output deep network ensemble learning for face recognition and verification
Li, Chaorong
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 793 - 802
[36] FoldHSphere: deep hyperspherical embeddings for protein fold recognition
Amelia Villegas-Morcillo
Victoria Sanchez
Angel M. Gomez
BMC Bioinformatics, 22
[37] Multi-feature output deep network ensemble learning for face recognition and verification
Chaorong Li
Signal, Image and Video Processing, 2024, 18 : 793 - 802
[38] Protein Interaction Network Reconstruction Through Ensemble Deep Learning With Attention Mechanism
Li, Feifei
Zhu, Fei
Ling, Xinghong
Liu, Quan
FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
[39] Texture Classification Using Deep Convolutional Neural Network with Ensemble Learning
Gupta, Krishan
Jain, Tushar
Sengupta, Debarka
MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION, MIKE 2018, 2018, 11308 : 341 - 350
[40] Protein Fold Recognition Using Self-Organizing Map Neural Network
Polat, Ozlem
Dokur, Zumray
CURRENT BIOINFORMATICS, 2016, 11 (04) : 451 - 458

← 1 2 3 4 5 →