Imbalanced Text Classification on Host Pathogen Protein-Protein Interaction Documents

被引:3
|
作者
Xu, Guixian [1 ,2 ]
Niu, Zhendong [2 ]
Gao, Xu [4 ]
Liu, Hongfang [3 ]
机构
[1] Minzu Univ, Coll Informat Engn, Beijing, Peoples R China
[2] Beijing Inst Technol, Coll Comp Sci, Beijing, Peoples R China
[3] Georgetown Univ, Med Ctr, Dept Bio3, Washington, DC 20007 USA
[4] North China Grid Co Ltd, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
imbalanced text classification; machine learning; protein-protein interaction;
D O I
10.1109/ICCAE.2010.5451921
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
important in understanding the fundamental processes governing cell biology. However, a large number of scientific findings about PPIs are buried in the growing volume of biomedical literature. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of negative documents. In this paper, we investigate the effects of feature selection and feature weighting as well as kernel function of Support Vector Machines (SVMs) on imbalanced two-class classification based on 1360 host-pathogen protein-protein interactions documents. The results show that the suitable feature weighting approach is the important factor for improving the classification performance. Adjusting cost sensitive parameter of radial basis function (RBF) kernel of SVM can decrease the minority class misclassification ratio and increase the classification accuracy on imbalanced documents classification. An automated classification system to identify MEDLINE abstracts referring to host-pathogen protein-protein interactions can been developed based on the experiment.
引用
收藏
页码:418 / 422
页数:5
相关论文
共 50 条
  • [1] Active Learning algorithm for Threshold of Decision Probability on Imbalanced Text Classification based on Protein-Protein Interaction Documents
    Xu, Guixian
    Niu, Zhendong
    Gao, Xu
    Cao, Yujuan
    Zhao, Yumin
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA STORAGE AND DATA ENGINEERING (DSDE 2010), 2010, : 78 - 82
  • [2] Semi-Supervised Learning of Text Classification on Bacterial Protein-Protein Interaction documents
    Xu, Guixian
    Niu, Zhendong
    Uetz, Peter
    Gao, Xu
    Qin, Xuping
    Liu, Hongfang
    2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 263 - +
  • [3] RETRACTED: Sentence extraction model on host pathogen protein-protein interaction documents (Retracted Article)
    Xu, Guixian
    Zhang, Xin
    Gao, Xu
    Yang, Guosheng
    2011 INTERNATIONAL CONFERENCE ON ENERGY AND ENVIRONMENTAL SCIENCE-ICEES 2011, 2011, 11 : 2368 - 2373
  • [4] Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features
    Kolchinsky, Artemy
    Abi-Haidar, Alaa
    Kaur, Jasleen
    Hamed, Ahmed Abdeen
    Rocha, Luis M.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (03) : 400 - 411
  • [5] RETRACTED: Comparison of classification methods on imbalanced protein-protein interaction text set (Retracted Article)
    Xu, Guixian
    Gao, Xu
    Zhao, Xiaobing
    2011 INTERNATIONAL CONFERENCE ON ENERGY AND ENVIRONMENTAL SCIENCE-ICEES 2011, 2011, 11 : 2295 - 2301
  • [6] Document classification for mining host pathogen protein-protein interactions
    Yin, Lanlan
    Xu, Guixian
    Torii, Manabu
    Niu, Zhendong
    Maisog, Jose M.
    Wu, Cathy
    Hu, Zhangzhi
    Liu, Hongfang
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 49 (03) : 155 - 160
  • [7] Document Classification for Mining Host Pathogen Protein-Protein Interactions
    Xu, Guixian
    Yin, Lanlan
    Torii, Manabu
    Niu, Zhendong
    Wu, Cathy
    Hu, Zhangzhi
    Liu, Hongfang
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 461 - +
  • [8] Training host-pathogen protein-protein interaction predictors
    Basit, Abdul Hannan
    Abbasi, Wajid Arshad
    Asif, Amina
    Gull, Sadaf
    Minhas, Fayyaz Ul Amir Afsar
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2018, 16 (04)
  • [9] Comprehensive host-pathogen protein-protein interaction network analysis
    Babak Khorsand
    Abdorreza Savadi
    Mahmoud Naghibzadeh
    BMC Bioinformatics, 21
  • [10] Comprehensive host-pathogen protein-protein interaction network analysis
    Khorsand, Babak
    Savadi, Abdorreza
    Naghibzadeh, Mahmoud
    BMC BIOINFORMATICS, 2020, 21 (01)