ESMSec: Prediction of Secreted Proteins in Human Body Fluids Using Protein Language Models and Attention

被引:0
|
作者
Wang, Yan [1 ]
Sun, Huiting [1 ]
Sheng, Nan [1 ]
He, Kai [2 ]
Hou, Wenjv [1 ]
Zhao, Ziqi [1 ]
Yang, Qixing [1 ]
Huang, Lan [1 ]
机构
[1] Jilin Univ, Minist Educ, Coll Comp Sci & Technol, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Peoples R China
[2] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48103 USA
基金
中国国家自然科学基金;
关键词
disease biomarkers; protein language models; multi-head attention; human body fluid; BIOMARKER DISCOVERY;
D O I
10.3390/ijms25126371
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The secreted proteins of human body fluid have the potential to be used as biomarkers for diseases. These biomarkers can be used for early diagnosis and risk prediction of diseases, so the study of secreted proteins of human body fluid has great application value. In recent years, the deep-learning-based transformer language model has transferred from the field of natural language processing (NLP) to the field of proteomics, leading to the development of protein language models (PLMs) for protein sequence representation. Here, we propose a deep learning framework called ESM Predict Secreted Proteins (ESMSec) to predict three types of proteins secreted in human body fluid. The ESMSec is based on the ESM2 model and attention architecture. Specifically, the protein sequence data are firstly put into the ESM2 model to extract the feature information from the last hidden layer, and all the input proteins are encoded into a fixed 1000 x 480 matrix. Secondly, multi-head attention with a fully connected neural network is employed as the classifier to perform binary classification according to whether they are secreted into each body fluid. Our experiment utilized three human body fluids that are important and ubiquitous markers. Experimental results show that ESMSec achieved average accuracy of 0.8486, 0.8358, and 0.8325 on the testing datasets for plasma, cerebrospinal fluid (CSF), and seminal fluid, which on average outperform the state-of-the-art (SOTA) methods. The outstanding performance results of ESMSec demonstrate that the ESM can improve the prediction performance of the model and has great potential to screen the secretion information of human body fluid proteins.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
    Thumuluri, Vineet
    Armenteros, Jose Juan Almagro
    Johansen, Alexander Rosenberg
    Nielsen, Henrik
    Winther, Ole
    NUCLEIC ACIDS RESEARCH, 2022, 50 (W1) : W228 - W234
  • [42] Prediction of virus-host associations using protein language models and multiple instance learning
    Liu, Dan
    Young, Francesca
    Lamb, Kieran D.
    Robertson, David L.
    Yuan, Ke
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (11)
  • [43] Single-sequence protein structure prediction by integrating protein language models
    Jing, Xiaoyang
    Wu, Fandi
    Luo, Xiao
    Xu, Jinbo
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (13)
  • [44] Identification of Novel Human Adipocyte Secreted Proteins by Using SGBS Cells
    Rosenow, Anja
    Arrey, Tabiwang N.
    Bouwman, Freek G.
    Noben, Jean-Paul
    Wabitsch, Martin
    Mariman, Edwin C. M.
    Karas, Michael
    Renes, Johan
    JOURNAL OF PROTEOME RESEARCH, 2010, 9 (10) : 5389 - 5401
  • [45] Cross-attention PHV: Prediction of human and virus protein-protein interactions using cross-attention-based neural networks
    Tsukiyama, Sho
    Kurata, Hiroyuki
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 5564 - 5573
  • [46] NCSP-PLM: An ensemble learning framework for predicting non- classical secreted proteins based on protein language models and deep learning
    Liu, Taigang
    Song, Chen
    Wang, Chunhua
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 1472 - 1488
  • [47] Using deep learning and large protein language models to predict protein-membrane interfaces of peripheral membrane proteins
    Paranou, Dimitra
    Chatzigoulas, Alexios
    Cournia, Zoe
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [48] Prediction of nuclear proteins using SVM and HMM models
    Kumar, Manish
    Raghava, Gajendra P. S.
    BMC BIOINFORMATICS, 2009, 10 : 22
  • [49] Prediction of nuclear proteins using SVM and HMM models
    Manish Kumar
    Gajendra PS Raghava
    BMC Bioinformatics, 10
  • [50] Computational Prediction of Human Body-Fluid Protein
    Shao, Dan
    Huang, Lan
    Wang, Yan
    Cui, Xueteng
    He, Kai
    Wang, Yao
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2735 - 2740