Multiple kernel learning in protein-protein interaction extraction from biomedical literature

被引:40
|
作者
Yang, Zhihao [1 ]
Tang, Nan [1 ]
Zhang, Xiao [1 ]
Lin, Hongfei [1 ]
Li, Yanpeng [1 ]
Yang, Zhiwei [2 ]
机构
[1] Dalian Univ Technol, Dept Comp Sci & Engn, Dalian 116024, Peoples R China
[2] Oil Field Hosp Daqing, Dept Ultrasound, Daqing 163001, Heilongjiang, Peoples R China
关键词
Support vector machines; Multiple kernel learning; Text mining; Information extraction; Protein-protein interaction; INFORMATION EXTRACTION; CORPUS; AREA;
D O I
10.1016/j.artmed.2010.12.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: Knowledge about protein-protein interactions (PPIs) unveils the molecular mechanisms of biological processes. The volume and content of published biomedical literature on protein interactions is expanding rapidly, making it increasingly difficult for interaction database administrators, responsible for content input and maintenance to detect and manually update protein interaction information. The objective of this work is to develop an effective approach to automatic extraction of PPI information from biomedical literature. Methods and materials: We present a weighted multiple kernel learning-based approach for automatic PPI extraction from biomedical literature. The approach combines the following kernels: feature-based, tree, graph and part-of-speech (POS) path. In particular, we extend the shortest path-enclosed tree (SPT) and dependency path tree to capture richer contextual information. Results: Our experimental results show that the combination of SPT and dependency path tree extensions contributes to the improvement of performance by almost 0.7 percentage units in F-score and 2 percentage units in area under the receiver operating characteristics curve (AUC). Combining two or more appropriately weighed individual will further improve the performance. Both on the individual corpus and cross-corpus evaluation our combined kernel can achieve state-of-the-art performance with respect to comparable evaluations, with 64.41% F-score and 88.46% AUC on the Aimed corpus. Conclusions: As different kernels calculate the similarity between two sentences from different aspects. Our combined kernel can reduce the risk of missing important features. More specifically, we use a weighted linear combination of individual kernels instead of assigning the same weight to each individual kernel, thus allowing the introduction of each kernel to incrementally contribute to the performance improvement. In addition, SPT and dependency path tree extensions can improve the performance by including richer context information. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:163 / 173
页数:11
相关论文
共 50 条
  • [1] Tree kernel-based protein-protein interaction extraction from biomedical literature
    Qian, Longhua
    Zhou, Guodong
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (03) : 535 - 543
  • [2] Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature
    Murugesan, Gurusamy
    Abdulkadhar, Sabenabanu
    Natarajan, Jeyakumar
    [J]. PLOS ONE, 2017, 12 (11):
  • [3] Protein-protein interaction extraction from biomedical literatures based on a combined kernel
    Li, Lishuang
    Ping, Jinyu
    Huang, Degen
    [J]. Journal of Information and Computational Science, 2010, 7 (05): : 1065 - 1073
  • [4] BioPPIExtractor: A protein-protein interaction extraction system for biomedical literature
    Yang, Zhihao
    Lin, Hongfei
    Wu, Baodong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 2228 - 2233
  • [5] Ranking SVM for Multiple Kernels Output Combination in Protein-Protein Interaction Extraction from Biomedical Literature
    Yang, Zhihao
    Lin, Yuan
    Wu, Jiajin
    Tang, Nan
    Lin, Hongfei
    Li, Yanpeng
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2010, : 595 - 598
  • [6] Uncertainty sampling-based active learning for protein-protein interaction extraction from biomedical literature
    Cui, Baojin
    Lin, Hongfei
    Yang, Zhihao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10344 - 10350
  • [7] A Hybrid Protein-Protein Interaction Triple Extraction Method for Biomedical Literature
    Zhao, Zhehuan
    Yang, Zhihao
    Sun, Cong
    Wang, Lei
    Lin, Hongfei
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1515 - 1521
  • [8] Deep Neural Network Based Protein-Protein Interaction Extraction from Biomedical Literature
    Zhao, Zhehuan
    Yang, Zhihao
    Luo, Ling
    Lin, Hongfei
    Wang, Jian
    Gao, Song
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1156 - 1156
  • [9] A tree kernel-based method for protein-protein interaction mining from biomedical literature
    Eom, Jae-Hong
    Kim, Sun
    Kim, Seong-Hwan
    Zhang, Byoung-Tak
    [J]. KNOWLEDGE DISCOVERY IN LIFE SCIENCE LITERATURE, PROCEEDINGS, 2006, 3886 : 42 - 52
  • [10] A Hybrid Deep Learning Model for Protein-Protein Interactions Extraction from Biomedical Literature
    Quan, Changqin
    Luo, Zhiwei
    Wang, Song
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (08):