Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features

被引:20
|
作者
Kolchinsky, Artemy [1 ,2 ]
Abi-Haidar, Alaa [1 ,2 ]
Kaur, Jasleen [1 ]
Hamed, Ahmed Abdeen [1 ]
Rocha, Luis M. [1 ,2 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47408 USA
[2] FLAD Computat Biol Collaboratorium, Inst Gulbenkian Ciencia, P-2780156 Oeiras, Portugal
关键词
Text mining; literature mining; binary classification; protein-protein interaction; citation network; INFORMATION; GENES;
D O I
10.1109/TCBB.2010.55
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We participated ( as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.
引用
收藏
页码:400 / 411
页数:12
相关论文
共 50 条
  • [1] Imbalanced Text Classification on Host Pathogen Protein-Protein Interaction Documents
    Xu, Guixian
    Niu, Zhendong
    Gao, Xu
    Liu, Hongfang
    [J]. 2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 418 - 422
  • [2] Efficient Extraction of Protein-Protein Interactions from Full-Text Articles
    Hakenberg, Joerg
    Leaman, Robert
    Vo, Nguyen Ha
    Jonnalagadda, Siddhartha
    Sullivan, Ryan
    Miller, Christopher
    Tari, Luis
    Baral, Chitta
    Gonzalez, Graciela
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) : 481 - 494
  • [3] Semi-Supervised Learning of Text Classification on Bacterial Protein-Protein Interaction documents
    Xu, Guixian
    Niu, Zhendong
    Uetz, Peter
    Gao, Xu
    Qin, Xuping
    Liu, Hongfang
    [J]. 2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 263 - +
  • [4] Protein-Protein Interaction text analysis
    Danger, Roxana
    Pla, Ferran
    Molina, Antonio
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 301 - 302
  • [5] BioC-compatible full-text passage detection for protein-protein interactions using extended dependency graph
    Peng, Yifan
    Arighi, Cecilia
    Wu, Cathy H.
    Vijay-Shanker, K.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [6] Protein-Protein Interaction Extraction from Text by Selecting Linguistic Features
    Thuy Thi Thanh Phan
    Ohkawa, Takenao
    Yamamoto, Akihiro
    [J]. 2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 181 - 187
  • [7] Mining Online Full-text Literature for Novel Protein Interaction Discovery
    Samuel, Jarvie
    Yuan, Xiaohui
    Yuan, Xiaojing
    Walton, Brian
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 277 - 282
  • [8] Active Learning algorithm for Threshold of Decision Probability on Imbalanced Text Classification based on Protein-Protein Interaction Documents
    Xu, Guixian
    Niu, Zhendong
    Gao, Xu
    Cao, Yujuan
    Zhao, Yumin
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA STORAGE AND DATA ENGINEERING (DSDE 2010), 2010, : 78 - 82
  • [9] Protein-protein interaction predictions using text mining methods
    Papanikolaou, Niko Las
    Pavlopoulos, Georgios A.
    Theodosiou, Theodosios
    Iliopoulos, Ioannis
    [J]. METHODS, 2015, 74 : 47 - 53
  • [10] Generation of Synthetic Images of Full-Text Documents
    Bures, Lukas
    Neduchal, Petr
    Hlavac, Miroslav
    Hruz, Marek
    [J]. SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 68 - 75