Comparative analysis of five protein-protein interaction corpora

被引:113
|
作者
Pyysalo, Sampo [1 ]
Airola, Antti
Heimonen, Juho
Bjorne, Jari
Ginter, Filip
Salakoski, Tapio
机构
[1] Univ Turku, TUCS, FIN-20520 Turku, Finland
关键词
PubMed Abstract; Entity Annotation; Entity Pair; Corpus Annotation; Annotate Entity;
D O I
10.1186/1471-2105-9-S3-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Growing interest in the application of natural language processing methods to biomedical text has led to an increasing number of corpora and methods targeting protein-protein interaction (PPI) extraction. However, there is no general consensus regarding PPI annotation and consequently resources are largely incompatible and methods are difficult to evaluate. Results: We present the first comparative evaluation of the diverse PPI corpora, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties. For the evaluation, we unify the corpus PPI annotations to a shared level of information, consisting of undirected, untyped binary interactions of non-static types with no identification of the words specifying the interaction, no negations, and no interaction certainty. We find that the F-score performance of a state-of-the-art PPI extraction method varies on average 19 percentage units and in some cases over 30 percentage units between the different evaluated corpora. The differences stemming from the choice of corpus can thus be substantially larger than differences between the performance of PPI extraction methods, which suggests definite limits on the ability to compare methods evaluated on different resources. We analyse a number of potential sources for these differences and identify factors explaining approximately half of the variance. We further suggest ways in which the difficulty of the PPI extraction tasks codified by different corpora can be determined to advance comparability. Our analysis also identifies points of agreement and disagreement in PPI corpus annotation that are rarely explicitly stated by the authors of the corpora. Conclusions: Our comparative analysis uncovers key similarities and differences between the diverse PPI corpora, thus taking an important step towards standardization. In the course of this study we have created a major practical contribution in converting the corpora into a shared format. The conversion software is freely available at http://mars.cs.utu.fi/PPICorpora.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Comparative analysis of five protein-protein interaction corpora
    Sampo Pyysalo
    Antti Airola
    Juho Heimonen
    Jari Björne
    Filip Ginter
    Tapio Salakoski
    BMC Bioinformatics, 9
  • [2] Comparative analysis of protein-protein interaction networks in metastatic breast cancer
    Hozhabri, Hossein
    Dehkohneh, Roxana Sadat Ghasemi
    Razavi, Seyed Morteza
    Razavi, S. Mostafa
    Salarian, Fatemeh
    Rasouli, Azade
    Azami, Jalil
    Shiran, Melika Ghasemi
    Kardan, Zahra
    Farrokhzad, Negar
    Namini, Arsham Mikaeili
    Salari, Ali
    PLOS ONE, 2022, 17 (01):
  • [3] Comparative analysis of protein-protein interaction networks in neural differentiation mechanisms
    Moazeny, Marzieh
    Salari, Ali
    Hojati, Zohreh
    Esmaeili, Fariba
    DIFFERENTIATION, 2022, 126 : 1 - 9
  • [4] Protein-Protein Interaction Analysis by Docking
    Fink, Florian
    Ederer, Stephan
    Gronwald, Wolfram
    ALGORITHMS, 2009, 2 (01): : 429 - 436
  • [5] Network analysis of protein-protein interaction
    Chang Shan
    Gong XinQi
    Jiao Xiong
    Li ChunHua
    Chen WeiZu
    Wang CunXin
    CHINESE SCIENCE BULLETIN, 2010, 55 (09): : 814 - 822
  • [6] Protein-Protein Interaction text analysis
    Danger, Roxana
    Pla, Ferran
    Molina, Antonio
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 301 - 302
  • [7] Bounded list injective homomorphism for comparative analysis of protein-protein interaction graphs
    Fagnot, Isabelle
    Lelandais, Gaelle
    Vialette, Stephane
    JOURNAL OF DISCRETE ALGORITHMS, 2008, 6 (02) : 178 - 191
  • [8] Protein-protein interaction analysis of Ter proteins
    Smidak, R.
    Aradska, J. S.
    Turkovicova, L.
    Turna, J.
    FEBS JOURNAL, 2012, 279 : 236 - 236
  • [9] Communities Analysis in Protein-protein Interaction Networks
    Li, Kan
    Pang, Yin
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [10] COMPUTER-ANALYSIS OF PROTEIN-PROTEIN INTERACTION
    WODAK, SJ
    JANIN, J
    JOURNAL OF MOLECULAR BIOLOGY, 1978, 124 (02) : 323 - 342