Feature selection using support vector machines.

被引：0

作者：

Brank, J ^{[1
]}

Grobelnik, M ^{[1
]}

Milic-Frayling, N ^{[1
]}

Mladenic, D ^{[1
]}

机构：

[1] Jozef Stefan Inst, Ljubljana, Slovenia

来源：

DATA MINING III | 2002年 / 6卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text categorization is the task of classifying natural language documents into a set of predefined categories. Documents are typically, represented by sparse vectors under the vector space model, where each word in the vocabulary is mapped to one coordinate axis and its occurrence in the document gives rise to one nonzero component in the vector representing that document. When training classifiers on large collections of documents, both the time and memory requirements connected with processing of these vectors may be prohibitive. This calls for using a feature selection method, not only to reduce the number of features but also to increase the sparsity of document vectors. We propose a feature selection method based on linear Support Vector Machines (SVMs). First, we train the linear SVM on a subset of training data and retain only those features that correspond to highly weighted components (in absolute value sense) of the normal to the resulting hyperplane that separates positive and negative examples. This reduced feature space is then used to train a classifier over a larger training set because more documents now fit into the same amount of memory. In our experiments we compare the effectiveness of the SVM-based feature selection with that of more traditional feature selection methods, such as odds ratio and information gain, in achieving the desired tradeoff between the vector sparsity and the classification performance. Experimental results indicate that, at the same level of vector sparsity, feature selection based on SVM normals yields better classification performance than odds ratio- or information gain-based feature selection when linear SVM classifiers are used.

引用

页码：261 / 273

页数：13

共 50 条

[1] Feature Selection using Fuzzy Support Vector Machines
Hong Xia
Bao Qing Hu
[J]. Fuzzy Optimization and Decision Making, 2006, 5 (2) : 187 - 192
[2] Feature selection for support vector machines
Hermes, L
Buhmann, JM
[J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 712 - 715
[3] Feature selection for scene categorization using support vector machines
Devendran, V
Thiagarajan, Hemalatha
Santra, A. K.
Wahi, Amitabh
[J]. CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 1, PROCEEDINGS, 2008, : 588 - +
[4] A wrapper method for feature selection using Support Vector Machines
Maldonado, Sebastian
Weber, Richard
[J]. INFORMATION SCIENCES, 2009, 179 (13) : 2208 - 2217
[5] Caco-2 permeability modeling: Feature selection via sparse support vector machines.
Breneman, CM
Bennett, KP
Bi, JB
Embrechts, MJ
Song, MH
[J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 223 : U349 - U349
[6] Feature selection for bagging of support vector machines
Li, Guo-Zheng
Liu, Tian-Yu
[J]. PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 271 - 277
[7] Feature selection for multiclass support vector machines
Aazi, F. Z.
Abdesselam, R.
Achchab, B.
Elouardighi, A.
[J]. AI COMMUNICATIONS, 2016, 29 (05) : 583 - 593
[8] Stable Feature Selection with Support Vector Machines
Kamkar, Iman
Gupta, Sunil Kumar
Dinh Phung
Venkatesh, Svetha
[J]. AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 298 - 308
[9] Optimal feature selection for support vector machines
Nguyen, Minh Hoai
de la Torre, Fernando
[J]. PATTERN RECOGNITION, 2010, 43 (03) : 584 - 591
[10] Feature selection for linear support vector machines
Liang, Zhizheng
Zhao, Tuo
[J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 606 - 609

← 1 2 3 4 5 →