A robust protein language model for SARS-CoV-2 protein-protein interaction network prediction

被引:8
|
作者
Ozger, Zeynep Banu [1 ]
机构
[1] Sutcu Imam Univ, Dept Comp Engn, TR-46040 Kahramanmaras, Turkiye
关键词
Protein-protein interaction; Protein language model; SARS-CoV-2; Virus-host interaction; Natural language processing;
D O I
10.1016/j.artmed.2023.102574
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Protein-protein interaction is one of the ways viruses interact with their hosts. Therefore, identifying protein interactions between viruses and hosts helps explain how virus proteins work, how they replicate, and how they cause disease. SARS-CoV-2 is a new type of virus that emerged from the coronavirus family in 2019 and caused a worldwide pandemic. Detection of human proteins interacting with this novel virus strain plays an important role in monitoring the cellular process of virus-associated infection.Within the scope of the study, a natural language processing-based collective learning method is proposed for the prediction of potential SARS-CoV-2-human PPIs. Protein language models were obtained with the prediction-based word2Vec and doc2Vec embedding methods and the frequency-based tf-idf method. Known interactions were represented by proposed language models and traditional feature extraction methods (conjoint triad and repeat pattern), and their performances were compared. The interaction data were trained with support vector machine, artificial neural network (ANN), k-nearest neighbor (KNN), naive Bayes (NB), decision tree (DT), and ensemble algorithms. Experimental results show that protein language models are a promising protein representation method for protein-protein interaction prediction. The term frequency-inverse document frequency-based language model performed the SARS-CoV-2 protein-protein interaction estimation with an error of 1.4%. Additionally, the decisions of high-performing learning models for different feature extraction methods were combined with a collective voting approach to make new interaction predictions. For 10,000 human proteins, 285 new potential interactions were predicted, with models combining decisions.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] The intraviral protein-protein interaction of SARS-CoV-2 reveals the key role of N protein in virus-like particle assembly
    Chen, Minghai
    Yan, Chuang
    Qin, Fujun
    Zheng, Luping
    Zhang, Xian-En
    INTERNATIONAL JOURNAL OF BIOLOGICAL SCIENCES, 2021, 17 (14): : 3889 - 3897
  • [22] Pooled PPIseq: Screening the SARS-CoV-2 and human interface with a scalable multiplexed protein-protein interaction assay platform
    Miller, Darach
    Dziulko, Adam
    Levy, Sasha
    PLOS ONE, 2025, 20 (01):
  • [23] A Bacterial Cell-Based Assay To Study SARS-CoV-2 Protein-Protein Interactions
    Springstein, Benjamin L.
    Deighan, Padraig
    Grabe, Grzegorz J.
    Hochschild, Ann
    MBIO, 2021, 12 (06):
  • [24] Cascading from SARS-CoV-2 to Parkinson's Disease through Protein-Protein Interactions
    Estrada, Ernesto
    VIRUSES-BASEL, 2021, 13 (05):
  • [25] SARS-CoV-2 nucleocapsid protein triggers hyperinflammation via protein-protein interaction-mediated intracellular Cl− accumulation in respiratory epithelium
    Lei Chen
    Wei-Jie Guan
    Zhuo-Er Qiu
    Jian-Bang Xu
    Xu Bai
    Xiao-Chun Hou
    Jing Sun
    Su Qu
    Ze-Xin Huang
    Tian-Lun Lei
    Zi-Yang Huang
    Jincun Zhao
    Yun-Xin Zhu
    Ke-Nan Ye
    Zhao-Rong Lun
    Wen-Liang Zhou
    Nan-Shan Zhong
    Yi-Lin Zhang
    Signal Transduction and Targeted Therapy, 7
  • [26] A Deep Integrated Framework for Predicting SARS-CoV2-Human Protein-Protein Interaction
    Ray, Sumanta
    Lall, Snehalika
    Bandyopadhyay, Sanghamitra
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2022, 6 (06): : 1463 - 1472
  • [27] Probabilistic model of the human protein-protein interaction network
    Daniel R Rhodes
    Scott A Tomlins
    Sooryanarayana Varambally
    Vasudeva Mahavisno
    Terrence Barrette
    Shanker Kalyana-Sundaram
    Debashis Ghosh
    Akhilesh Pandey
    Arul M Chinnaiyan
    Nature Biotechnology, 2005, 23 : 951 - 959
  • [28] Probabilistic model of the human protein-protein interaction network
    Rhodes, DR
    Tomlins, SA
    Varambally, S
    Mahavisno, V
    Barrette, T
    Kalyana-Sundaram, S
    Ghosh, D
    Pandey, A
    Chinnaiyan, AM
    NATURE BIOTECHNOLOGY, 2005, 23 (08) : 951 - 959
  • [29] Conserved network motifs allow protein-protein interaction prediction
    Albert, I
    Albert, R
    BIOINFORMATICS, 2004, 20 (18) : 3346 - 3352
  • [30] Protein function prediction using neighbor relativity in protein-protein interaction network
    Moosavi, Sobhan
    Rahgozar, Masoud
    Rahimi, Amir
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2013, 43 : 11 - 16