Feature selection using support vector machines.

被引:0
|
作者
Brank, J [1 ]
Grobelnik, M [1 ]
Milic-Frayling, N [1 ]
Mladenic, D [1 ]
机构
[1] Jozef Stefan Inst, Ljubljana, Slovenia
来源
DATA MINING III | 2002年 / 6卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically, represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training classifiers on large collections of documents, both the time and memory requirements connected with processing of these vectors may be prohibitive. This calls for using a feature selection method, not only to reduce the number of features but also to increase the sparsity of document vectors. We propose a feature selection method based on linear Support Vector Machines (SVMs). First, we train the linear SVM on a subset of training data and retain only those features that correspond to highly weighted components (in absolute value sense) of the normal to the resulting hyperplane that separates positive and negative examples. This reduced feature space is then used to train a classifier over a larger training set because more documents now fit into the same amount of memory. In our experiments we compare the effectiveness of the SVM-based feature selection with that of more traditional feature selection methods, such as odds ratio and information gain, in achieving the desired tradeoff between the vector sparsity and the classification performance. Experimental results indicate that, at the same level of vector sparsity, feature selection based on SVM normals yields better classification performance than odds ratio- or information gain-based feature selection when linear SVM classifiers are used.
引用
收藏
页码:261 / 273
页数:13
相关论文
共 50 条
  • [1] Feature Selection using Fuzzy Support Vector Machines
    Hong Xia
    Bao Qing Hu
    [J]. Fuzzy Optimization and Decision Making, 2006, 5 (2) : 187 - 192
  • [2] Feature selection for support vector machines
    Hermes, L
    Buhmann, JM
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 712 - 715
  • [3] Feature selection for scene categorization using support vector machines
    Devendran, V
    Thiagarajan, Hemalatha
    Santra, A. K.
    Wahi, Amitabh
    [J]. CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 1, PROCEEDINGS, 2008, : 588 - +
  • [4] A wrapper method for feature selection using Support Vector Machines
    Maldonado, Sebastian
    Weber, Richard
    [J]. INFORMATION SCIENCES, 2009, 179 (13) : 2208 - 2217
  • [5] Caco-2 permeability modeling: Feature selection via sparse support vector machines.
    Breneman, CM
    Bennett, KP
    Bi, JB
    Embrechts, MJ
    Song, MH
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 223 : U349 - U349
  • [6] Feature selection for bagging of support vector machines
    Li, Guo-Zheng
    Liu, Tian-Yu
    [J]. PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 271 - 277
  • [7] Feature selection for multiclass support vector machines
    Aazi, F. Z.
    Abdesselam, R.
    Achchab, B.
    Elouardighi, A.
    [J]. AI COMMUNICATIONS, 2016, 29 (05) : 583 - 593
  • [8] Stable Feature Selection with Support Vector Machines
    Kamkar, Iman
    Gupta, Sunil Kumar
    Dinh Phung
    Venkatesh, Svetha
    [J]. AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 298 - 308
  • [9] Optimal feature selection for support vector machines
    Nguyen, Minh Hoai
    de la Torre, Fernando
    [J]. PATTERN RECOGNITION, 2010, 43 (03) : 584 - 591
  • [10] Feature selection for linear support vector machines
    Liang, Zhizheng
    Zhao, Tuo
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 606 - 609