Oscillating feature subset search algorithm for text categorization

被引:0
|
作者
Novovicova, Jana [1 ]
Somol, Petr
Pudil, Pavel
机构
[1] Acad Sci Czech Republ, Inst Informat Theory & Automat, Dept Pattern Recognit, Prague, Czech Republic
[2] Prague Univ Econ, Fac Management, Prague, Czech Republic
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A major characteristic of text document categorization problems is the extremely high dimensionality of text data. In this paper we explore the usability of the Oscillating Search algorithm for feature/word selection in text categorization. We propose to use the multiclass Bhattacharyya distance for multinomial model as the global feature subset selection criterion for reducing the dimensionality of the bag of words vector document representation. This criterion takes into consideration inter-feature relationships. We experimentally compare three subset selection procedures: the commonly used best individual feature selection based on information gain, the same based on individual Bhattacharyya distance, and the Oscillating Search to maximize Bhattacharyya distance on groups of features. The obtained feature subsets are then tested on the standard Reuters data with two classifiers: the multinomial Bayes and the linear SVM. The presented experimental results illustrate that using a non-trivial feature selection algorithm is not only computationally feasible, but it also brings substantial improvement in classification accuracy over traditional, individual feature evaluation based methods.
引用
收藏
页码:578 / 587
页数:10
相关论文
共 50 条
  • [1] Feature subset selection in SOM based text categorization
    Bassiouny, S
    Nagi, M
    Hussein, MF
    [J]. IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 860 - 866
  • [2] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
  • [3] A Novel Feature Weight Algorithm for Text Categorization
    Shang, Wenqian
    Dong, Hongbin
    Zhu, Haibin
    Wang, Yongbin
    [J]. IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 269 - 275
  • [4] A thermodynamical search algorithm for feature subset selection
    Gonzalez, Felix F.
    Belanche, Lluis A.
    [J]. NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 683 - 692
  • [5] Dynamic Oscillating Search Algorithm for Feature Selection
    Somol, P.
    Novovicova, J.
    Grim, J.
    Pudil, P.
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2308 - 2311
  • [6] An Improved Strategy of the Feature Selection Algorithm for the Text Categorization
    Yang, Jieming
    Lu, Yixin
    Liu, Zhiying
    [J]. 2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2019, : 3 - 7
  • [7] Feature subset selection by gravitational search algorithm optimization
    Han, XiaoHong
    Chang, XiaoMing
    Quan, Long
    Xiong, XiaoYan
    Li, JingXia
    Zhang, ZhaoXia
    Liu, Yi
    [J]. INFORMATION SCIENCES, 2014, 281 : 128 - 146
  • [8] Binary Owl Search Algorithm for Feature Subset Selection
    Mandal, Ashis Kumar
    Sen, Rikta
    Chakraborty, Basabi
    [J]. 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST 2019), 2019, : 186 - 191
  • [9] GU metric - A new feature selection algorithm for text categorization
    Uchyigit, Gulden
    Clark, Keith
    [J]. ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2007, : 399 - 402
  • [10] Class-dependent feature selection algorithm for text categorization
    Fragoso, Rogerio C. P.
    Pinheiro, Roberto H. W.
    Cavalcanti, George D. C.
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3508 - 3515