Using typical testors for feature selection in text categorization

被引:0
|
作者
Pons-Porratal, Aurora [1 ]
Gil-Garcia, Reynaldo [1 ]
Berlanga-Liavori, Rafael [2 ]
机构
[1] Univ Oriente, Ctr Pattern Recognit & Data Mining, Santiago De Cuba, Cuba
[2] Univ Jaume 1, Castellon de La Plana, Spain
关键词
feature selection; typical testors; text categorization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A major difficulty of text categorization problems is the high dimensionality of the feature space. Thus, feature selection is often performed in order to increase both the efficiency and effectiveness of the classification. In this paper, we propose a feature selection method based on Testor Theory. This criterion takes into account inter-feature relationships. We experimentally compared our method with the widely used information gain using two well-known classification algorithms: k-nearest neighbour and Support Vector Machine. Two benchmark text collections were chosen as the testbeds: Reuters-21578 and Reuters Corpus Version 1 (RCV1v2). We found that our method consistently outperformed information gain for both classifiers and both data collections, especially when aggressive feature selection is carried out.
引用
收藏
页码:643 / +
页数:2
相关论文
共 50 条
  • [1] Feature selection using typical ε:: Testors, working on dynamical data
    Carrasco-Ochoa, JA
    Ruiz-Shulcloper, J
    De-La-Vega-Doría, LA
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, 2004, 3287 : 685 - 692
  • [2] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486
  • [3] Feature selection strategies for text categorization
    Soucy, P
    Mineau, GW
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 505 - 509
  • [4] Naive bayes text categorization using improved feature selection
    Lin, Kunhui
    Kang, Kai
    Huang, Yunping
    Zhou, Changle
    Wang, Beizhan
    Journal of Computational Information Systems, 2007, 3 (03): : 1159 - 1164
  • [5] An Efficient Feature Selection using Hidden Topic in Text Categorization
    Zhang, Zhiwei
    Phan, Xuan-Hieu
    Horiguchi, Susumu
    2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, 2008, : 1223 - 1228
  • [6] FEATURE SELECTION USING PARTICLE SWARM OPTIMIZATION IN TEXT CATEGORIZATION
    Aghdam, Mehdi Hosseinzadeh
    Heidari, Setareh
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2015, 5 (04) : 231 - 238
  • [7] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    WuhanUniversityJournalofNaturalSciences, 2006, (05) : 1335 - 1339
  • [8] Normalized and classified feature selection in text categorization
    Wang, XJ
    Guo, J
    Zheng, KF
    INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2005, VOLS 1 AND 2, PROCEEDINGS, 2005, : 173 - 176
  • [9] Improving Text Categorization by Multicriteria Feature Selection
    Doan, Son
    Horiguchi, Susumu
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2005, 9 (05) : 570 - 575
  • [10] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5