Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features

被引:21
|
作者
Kolchinsky, Artemy [1 ,2 ]
Abi-Haidar, Alaa [1 ,2 ]
Kaur, Jasleen [1 ]
Hamed, Ahmed Abdeen [1 ]
Rocha, Luis M. [1 ,2 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47408 USA
[2] FLAD Computat Biol Collaboratorium, Inst Gulbenkian Ciencia, P-2780156 Oeiras, Portugal
关键词
Text mining; literature mining; binary classification; protein-protein interaction; citation network; INFORMATION; GENES;
D O I
10.1109/TCBB.2010.55
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We participated ( as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.
引用
收藏
页码:400 / 411
页数:12
相关论文
共 50 条
  • [41] On the creation of hypertext links in full-text documents: Measurement of retrieval effectiveness
    Ellis, D
    Furner, J
    Willett, P
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1996, 47 (04): : 287 - 300
  • [42] Evaluating GPT and BERT models for protein-protein interaction identification in biomedical text
    Rehana, Hasin
    Cam, Nur Bengisu
    Basmaci, Mert
    Zheng, Jie
    Jemiyo, Christianah
    He, Yongqun
    Ozgur, Arzucan
    Hur, Junguk
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [43] Big Data Full-Text Search Index Minimization Using Text Summarization
    Iqbal, Waheed
    Malik, Waqas Ilyas
    Bukhari, Faisal
    Almustafa, Khaled Mohamad
    Nawaz, Zubiar
    INFORMATION TECHNOLOGY AND CONTROL, 2021, 50 (02): : 375 - 389
  • [44] Using distant supervised learning to identify protein subcellular localizations from full-text scientific articles
    Zheng, Wu
    Blake, Catherine
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 57 : 134 - 144
  • [45] WHY FULL-TEXT MISSES SOME RELEVANT DOCUMENTS - AN ANALYSIS OF DOCUMENTS NOT RETRIEVED BY CCML OR MEDIS
    SIEVERT, M
    MCKININ, EJ
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1989, 26 : 34 - 39
  • [46] Protein-Protein Interactions Classification from Text via Local Learning with Class Priors
    He, Yulan
    Lin, Chenghua
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 5723 : 182 - 191
  • [47] Full-Text Search Extensions for JSON']JSON Documents: Design Goals and Implementations
    Petkovic, Dusan
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES: FACING THE CHALLENGES OF DATA PROLIFERATION AND GROWING VARIETY, 2018, 928 : 283 - 293
  • [48] Using Automatic Features for Text-image Classification in Amharic Documents
    Belay, Birhanu
    Habtegebrial, Tewodros
    Belay, Gebeyehu
    Stricker, Didier
    ICPRAM: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2020, : 440 - 445
  • [49] Expanded information retrieval using full-text searching
    Kostoff, Ronald N.
    JOURNAL OF INFORMATION SCIENCE, 2010, 36 (01) : 104 - 113
  • [50] Madadoc, a digital library with full-text documents on rural development and the environment in Madagascar
    Rahaingo-Razafimbelo, Marie-Marcelline
    Randrianato, Haja
    CAHIERS AGRICULTURES, 2011, 20 (04) : 301 - 309