Two step POS selection for SVM based text categorization

被引:0
|
作者
Masuyama, T [1 ]
Nakagawa, H [1 ]
机构
[1] Univ Tokyo, Ctr Informat Technol, Tokyo 1130033, Japan
来源
关键词
text categorization; text classification; support vector machine (SVM); parts of speech (POS); variable cascaded feature selection (VCFS);
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although many researchers have verified the superiority of Support Vector Machine (SVM) on text categorization tasks, some recent papers have reported much lower performance of SVM based text categorization methods when focusing on all types of parts of speech (POS) as input words and treating large numbers of training documents. This was caused by the overfitting problem that SVM sometimes selected unsuitable support vectors for each category in the training set. To avoid the overfitting problem, we propose a two step text categorization method with a variable cascaded feature selection (VCFS) using SVM. VCFS method selects a pair of the best number of words and the best POS combination for each category at each step of the cascade. We made use of the difference of words with the highest mutual information for each category on each POS combination. Through the experiments, we confirmed the validation of VCFS method compared with other SVM based text categorization methods, since our results showed that the macro-averaged F-1 measure (64.8%) of VCFS method was significantly better than any reported F-1 measures, though the micro-averaged F-1 measure (85.4%) of VCFS method was similar to them.
引用
收藏
页码:373 / 379
页数:7
相关论文
共 50 条
  • [1] Feature selection in SVM text categorization
    Taira, H
    Haruno, M
    [J]. SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 480 - 486
  • [2] Research of text categorization based on SVM
    Wang, Meihua
    Zhang, Hongbin
    Ding, Renshuang
    [J]. 2010 INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT (CCCM2010), VOL I, 2010, : 676 - 679
  • [3] Research of Text Categorization Based on SVM
    Wang, Meihua
    Zhang, Hongbin
    Ding, Renshuang
    [J]. PROCEEDINGS OF THE 2011 INTERNATIONAL CONFERENCE ON INFORMATICS, CYBERNETICS, AND COMPUTER ENGINEERING (ICCE2011), VOL 2: INFORMATION SYSTEMS AND COMPUTER ENGINEERING, 2011, 111 : 69 - 77
  • [4] Applying cascaded feature selection to SVM text categorization
    Masuyama, T
    Nakagawa, H
    [J]. 13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 241 - 245
  • [5] An optimal Text categorization algorithm based on SVM
    Wang, Ziqiang
    Sun, Xia
    Zhang, Dexian
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1-4: VOL 1: SIGNAL PROCESSING, 2006, : 2137 - +
  • [6] SVM - Neighbor based candidate working set selection applied on text-categorization
    Kinto, Eduardo Akira
    Del-Moral-Hernandez, Emilio
    [J]. 2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [7] An algorithm for text categorization with SVM
    Hu, J
    Huang, HK
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 47 - 50
  • [8] Self-tuning SVM with Feature Selection for Text Categorization Problem
    Panfilov, Ilia
    Sopov, Evgeny
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (ICCSAI 2014), 2015, : 208 - 211
  • [9] Avoidance of Model Re-Induction in SVM-based Feature Selection for Text Categorization
    Kolcz, Aleksander
    Chowdhury, Abdur
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 889 - 894
  • [10] Latent Factor SVM for Text Categorization
    Zhou, Xiaofei
    Guo, Li
    Liu, Ping
    Liu, Yanbing
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 105 - 110