PEDL: extracting protein-protein associations using deep language models and distant supervision

被引:6
|
作者
Weber, Leon [1 ,2 ]
Thobe, Kirsten [2 ]
Lozano, Oscar Arturo Migueles [2 ]
Wolf, Jana [2 ]
Leser, Ulf [1 ]
机构
[1] Humboldt Univ, Comp Sci Dept, D-10099 Berlin, Germany
[2] Max Delbruck Ctr Mol Med, Grp Math Modelling Cellular Proc, Helmholtz Assoc, D-13125 Berlin, Germany
关键词
CYCLOOXYGENASE-2; EXPRESSION; NETWORK; COMPLEX; CELLS;
D O I
10.1093/bioinformatics/btaa430
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A significant portion of molecular biology investigates signalling pathways and thus depends on an up-to-date and complete resource of functional protein-protein associations (PPAs) that constitute such pathways. Despite extensive curation efforts, major pathway databases are still notoriously incomplete. Relation extraction can help to gather such pathway information from biomedical publications. Current methods for extracting PPAs typically rely exclusively on rare manually labelled data which severely limits their performance. Results: We propose PPA Extraction with Deep Language (PEDL), a method for predicting PPAs from text that combines deep language models and distant supervision. Due to the reliance on distant supervision, PEDL has access to an order of magnitude more training data than methods solely relying on manually labelled annotations. We introduce three different datasets for PPA prediction and evaluate PEDL for the two subtasks of predicting PPAs between two proteins, as well as identifying the text spans stating the PPA. We compared PEDL with a recently published state-of-the-art model and found that on average PEDL performs better in both tasks on all three datasets. An expert evaluation demonstrates that PEDL can be used to predict PPAs that are missing from major pathway databases and that it correctly identifies the text spans supporting the PPA.
引用
收藏
页码:490 / 498
页数:9
相关论文
共 50 条
  • [1] Interfacial Protein-Protein Associations
    Langdon, Blake B.
    Kastantin, Mark
    Walder, Robert
    Schwartz, Daniel K.
    BIOMACROMOLECULES, 2014, 15 (01) : 66 - 74
  • [2] Extracting Protein-Protein Interactions from MEDLINE Using Syntactic Roles
    Ahmed, Syed Toufeeq
    Davulcu, Hasan
    Baral, Chitta
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 473 - 476
  • [3] Protein Function Prediction Using Function Associations in Protein-Protein Interaction Network
    Sun, Pingping
    Tan, Xian
    Guo, Sijia
    Zhang, Jingbo
    Sun, Bojian
    Du, Ning
    Wang, Han
    Sun, Hui
    IEEE ACCESS, 2018, 6 : 30892 - 30902
  • [4] Extracting protein-protein interactions in biomedical literature using an existing syntactic parser
    Jang, Hyunchul
    Lim, Jaesoo
    Lim, Joon-Ho
    Park, Soo-Jun
    Park, Seon-Hee
    Lee, Kyu-Chul
    KNOWLEDGE DISCOVERY IN LIFE SCIENCE LITERATURE, PROCEEDINGS, 2006, 3886 : 78 - 90
  • [5] A mobile system for extracting and visualizing protein-protein interactions
    Han, K
    Kim, H
    EURASIA-ICT 2002: INFORMATION AND COMMUNICATION TECHNOLOGY, PROCEEDINGS, 2002, 2510 : 47 - 56
  • [6] DeepRank-GNN-esm: a graph neural network for scoring protein-protein models using protein language model
    Xu, Xiaotong
    Bonvin, Alexandre M. J. J.
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [7] Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT
    Elangovan, Aparna
    Li, Yuan
    Pires, Douglas E., V
    Davis, Melissa J.
    Verspoor, Karin
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [8] Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT
    Aparna Elangovan
    Yuan Li
    Douglas E. V. Pires
    Melissa J. Davis
    Karin Verspoor
    BMC Bioinformatics, 23
  • [9] Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships
    Espadaler, J
    Romero-Isart, O
    Jackson, RM
    Oliva, B
    BIOINFORMATICS, 2005, 21 (16) : 3360 - 3368
  • [10] Finding correct protein-protein docking models using ProQDock
    Basu, Sankar
    Wallner, Bjorn
    BIOINFORMATICS, 2016, 32 (12) : 262 - 270