PEDL: extracting protein-protein associations using deep language models and distant supervision

被引:6
|
作者
Weber, Leon [1 ,2 ]
Thobe, Kirsten [2 ]
Lozano, Oscar Arturo Migueles [2 ]
Wolf, Jana [2 ]
Leser, Ulf [1 ]
机构
[1] Humboldt Univ, Comp Sci Dept, D-10099 Berlin, Germany
[2] Max Delbruck Ctr Mol Med, Grp Math Modelling Cellular Proc, Helmholtz Assoc, D-13125 Berlin, Germany
关键词
CYCLOOXYGENASE-2; EXPRESSION; NETWORK; COMPLEX; CELLS;
D O I
10.1093/bioinformatics/btaa430
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A significant portion of molecular biology investigates signalling pathways and thus depends on an up-to-date and complete resource of functional protein-protein associations (PPAs) that constitute such pathways. Despite extensive curation efforts, major pathway databases are still notoriously incomplete. Relation extraction can help to gather such pathway information from biomedical publications. Current methods for extracting PPAs typically rely exclusively on rare manually labelled data which severely limits their performance. Results: We propose PPA Extraction with Deep Language (PEDL), a method for predicting PPAs from text that combines deep language models and distant supervision. Due to the reliance on distant supervision, PEDL has access to an order of magnitude more training data than methods solely relying on manually labelled annotations. We introduce three different datasets for PPA prediction and evaluate PEDL for the two subtasks of predicting PPAs between two proteins, as well as identifying the text spans stating the PPA. We compared PEDL with a recently published state-of-the-art model and found that on average PEDL performs better in both tasks on all three datasets. An expert evaluation demonstrates that PEDL can be used to predict PPAs that are missing from major pathway databases and that it correctly identifies the text spans supporting the PPA.
引用
收藏
页码:490 / 498
页数:9
相关论文
共 50 条
  • [31] EXTRACTING THREAT INTELLIGENCE RELATIONS USING DISTANT SUPERVISION AND NEURAL NETWORKS
    Luo, Yali
    Ao, Shengqin
    Luo, Ning
    Su, Changxin
    Yang, Peian
    Jiang, Zhengwei
    ADVANCES IN DIGITAL FORENSICS XVII, 2021, 612 : 193 - 211
  • [32] Protein language models using convolutions
    Tang, Lin
    NATURE METHODS, 2024, 21 (04) : 550 - 550
  • [33] Deep learning of protein sequence design of protein-protein interactions
    Syrlybaeva, Raulia
    Strauch, Eva-Maria
    BIOINFORMATICS, 2023, 39 (01)
  • [34] Discrimination and Prediction of Protein-Protein Binding Affinity Using Deep Learning Approach
    Nikam, Rahul
    Yugandhar, K.
    Gromiha, M. Michael
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT II, 2018, 10955 : 809 - 815
  • [35] Using Deep Neural Networks to Improve the Performance of Protein-Protein Interactions Prediction
    Gui, Yuan-Miao
    Wang, Ru-Jing
    Wang, Xue
    Wei, Yuan-Yuan
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (13)
  • [36] Prediction of Protein-Protein Interactions using Deep Multi-Modal Representations
    Jha, Kanchan
    Saha, Sriparna
    Saha, Snehanshu
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [37] Inferring Protein-Protein Interactions by Combinatorial Models
    Zhang, Xiang-Sun
    Wang, Rui-Sheng
    Wu, Ling-Yun
    Zhang, Shi-Hua
    Cben, Luonan
    WORLD CONGRESS ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING 2006, VOL 14, PTS 1-6, 2007, 14 : 183 - +
  • [38] Electrostatic models for protein-protein binding.
    Wade, RC
    Wang, T
    Gabdoulline, R
    Ehrlich, L
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 223 : C93 - C93
  • [39] Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning
    Hou, Zilong
    Yang, Yuning
    Ma, Zhiqiang
    Wong, Ka-chun
    Li, Xiangtao
    COMMUNICATIONS BIOLOGY, 2023, 6 (01)
  • [40] Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning
    Zilong Hou
    Yuning Yang
    Zhiqiang Ma
    Ka-chun Wong
    Xiangtao Li
    Communications Biology, 6