Imbalanced Text Classification on Host Pathogen Protein-Protein Interaction Documents

被引:3
|
作者
Xu, Guixian [1 ,2 ]
Niu, Zhendong [2 ]
Gao, Xu [4 ]
Liu, Hongfang [3 ]
机构
[1] Minzu Univ, Coll Informat Engn, Beijing, Peoples R China
[2] Beijing Inst Technol, Coll Comp Sci, Beijing, Peoples R China
[3] Georgetown Univ, Med Ctr, Dept Bio3, Washington, DC 20007 USA
[4] North China Grid Co Ltd, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
imbalanced text classification; machine learning; protein-protein interaction;
D O I
10.1109/ICCAE.2010.5451921
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
important in understanding the fundamental processes governing cell biology. However, a large number of scientific findings about PPIs are buried in the growing volume of biomedical literature. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of negative documents. In this paper, we investigate the effects of feature selection and feature weighting as well as kernel function of Support Vector Machines (SVMs) on imbalanced two-class classification based on 1360 host-pathogen protein-protein interactions documents. The results show that the suitable feature weighting approach is the important factor for improving the classification performance. Adjusting cost sensitive parameter of radial basis function (RBF) kernel of SVM can decrease the minority class misclassification ratio and increase the classification accuracy on imbalanced documents classification. An automated classification system to identify MEDLINE abstracts referring to host-pathogen protein-protein interactions can been developed based on the experiment.
引用
收藏
页码:418 / 422
页数:5
相关论文
共 50 条
  • [21] Comparative host-pathogen protein-protein interaction analysis of recent coronavirus outbreaks and important host targets identification
    Khan, Abdul Arif
    Khan, Zakir
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (02) : 1206 - 1214
  • [22] Towards Elucidating the Structural Principles of Host-Pathogen Protein-Protein Interaction Networks: A bioinformatics survey
    Chen, Huaming
    Song, Jiangning
    Sun, Geng
    Shen, Jun
    Wang, Lei
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 177 - 184
  • [23] Comparison of Classification Methods on Protein-Protein Interaction Document Classification
    Xu, Guixian
    Niu, Zhendong
    Uetz, Peter
    Gao, Xu
    Liu, Hongfang
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, PROCEEDINGS, 2008, : 83 - +
  • [24] Computational approaches for prediction of pathogen-host protein-protein interactions
    Nourani, Esmaeil
    Khunjush, Farshad
    Durmus, Saliha
    FRONTIERS IN MICROBIOLOGY, 2015, 6
  • [25] Feature generation and representations for protein-protein interaction classification
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 866 - 872
  • [26] Protein-protein interaction predictions using text mining methods
    Papanikolaou, Niko Las
    Pavlopoulos, Georgios A.
    Theodosiou, Theodosios
    Iliopoulos, Ioannis
    METHODS, 2015, 74 : 47 - 53
  • [27] An Extended Feature Representation Technique for Predicting Sequenced-based Host-pathogen Protein-protein Interaction
    Emmanuel, Jerry
    Isewon, Itunuoluwa
    Olasehinde, Grace
    Oyelade, Jelili
    CURRENT BIOINFORMATICS, 2025, 20 (03) : 229 - 245
  • [28] In silico unravelling pathogen-host signaling cross-talks via pathogen mimicry and human protein-protein interaction networks
    Mei, Suyu
    Zhang, Kun
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 (18): : 100 - 113
  • [29] Elucidation of host-pathogen protein-protein interactions to uncover mechanisms of host cell rewiring
    Nicod, Charlotte
    Banaei-Esfahani, Amir
    Collins, Ben C.
    CURRENT OPINION IN MICROBIOLOGY, 2017, 39 : 7 - 15
  • [30] Global Protein-Protein Interaction Network of Rice Sheath Blight Pathogen
    Lei, Ding
    Lin, Runmao
    Yin, Chuanchun
    Li, Ping
    Zhen, Aiping
    JOURNAL OF PROTEOME RESEARCH, 2014, 13 (07) : 3277 - 3293