A Variance-mean Based Feature Selection in Text Classification

被引:3
|
作者
Yin, Shen [1 ]
Jiang, Zongli [1 ]
机构
[1] Beijing Univ Technol, Beijing, Peoples R China
关键词
feature selection; variance-mean; text classification;
D O I
10.1109/ETCS.2009.646
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Feature selection is an important process to choose a subset of features relevant to a particular application in text classification. Based on the mutual information method, we designed variance-mean based feature selection (VM). After computing and ranking the variance of class discrimination value vector for each word, we can choose the most distinguishable features. This method has advantages in the case of choosing smaller number of features, especially for classes with small number of training documents. It keeps the best features, and thus improves the final performance of the classification system. The experiment results indicate the effectiveness of the proposed feature selection method in a text classification.
引用
收藏
页码:519 / 522
页数:4
相关论文
共 50 条
  • [1] On normal variance-mean mixtures
    Yu, Yaming
    [J]. STATISTICS & PROBABILITY LETTERS, 2017, 121 : 45 - 50
  • [2] Distance Variance Score: An Efficient Feature Selection Method in Text Classification
    Wang, Heyong
    Hong, Ming
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [3] Contaminated Variance-Mean mixing model
    Fung, Thomas
    Wang, Joanna J. J.
    Seneta, Eugene
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 67 : 258 - 267
  • [4] A New Feature Selection Algorithm Based on the Mean Impact Variance
    Cheng, Weidong
    Wang, Tianyang
    Wen, Weigang
    Li, Jianyong
    Gao, Robert X.
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [5] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [6] The semiparametric normal variance-mean mixture model
    Korsholm, L
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2000, 27 (02) : 227 - 261
  • [7] Utility-based feature selection for text classification
    Heyong Wang
    Ming Hong
    Raymond Yiu Keung Lau
    [J]. Knowledge and Information Systems, 2019, 61 : 197 - 226
  • [8] Utility-based feature selection for text classification
    Wang, Heyong
    Hong, Ming
    Lau, Raymond Yiu Keung
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 197 - 226
  • [9] BIRD ECOLOGY AND TAYLOR VARIANCE-MEAN REGRESSION
    HANSKI, I
    TIAINEN, J
    [J]. ANNALES ZOOLOGICI FENNICI, 1989, 26 (03) : 213 - 217
  • [10] Analysis of variance-mean relationships of plant diseases
    Yang, XB
    [J]. JOURNAL OF PHYTOPATHOLOGY-PHYTOPATHOLOGISCHE ZEITSCHRIFT, 1995, 143 (09): : 513 - 518