Distance Variance Score: An Efficient Feature Selection Method in Text Classification

被引:15
|
作者
Wang, Heyong [1 ]
Hong, Ming [1 ]
机构
[1] S China Univ Technol, Dept E Business, Guangzhou 510006, Guangdong, Peoples R China
关键词
LAPLACIAN SCORE;
D O I
10.1155/2015/695720
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid development of web applications such as social network, a large amount of electric text data is accumulated and available on the Internet, which causes increasing interests in text mining. Text classification is one of the most important subfields of text mining. In fact, text documents are often represented as a high-dimensional sparse document term matrix (DTM) before classification. Feature selection is essential and vital for text classification due to high dimensionality and sparsity of DTM. An efficient feature selection method is capable of both reducing dimensions of DTM and selecting discriminative features for text classification. Laplacian Score (LS) is one of the unsupervised feature selection methods and it has been successfully used in areas such as face recognition. However, LS is unable to select discriminative features for text classification and to effectively reduce the sparsity of DTM. To improve it, this paper proposes an unsupervised feature selection method named Distance Variance Score (DVS). DVS uses feature distance contribution (a ratio) to rank the importance of features for text documents so as to select discriminative features. Experimental results indicate that DVS is able to select discriminative features and reduce the sparsity of DTM. Thus, it is much more efficient than LS.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [2] A Variance-mean Based Feature Selection in Text Classification
    Yin, Shen
    Jiang, Zongli
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL III, 2009, : 519 - 522
  • [3] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [4] Feature Selection Method of Text Tendency Classification
    Li, Yanling
    Dai, Guanzhong
    Li, Gang
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
  • [5] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [6] A parallel feature selection method study for text classification
    Li, Zhao
    Lu, Wei
    Sun, Zhanquan
    Xing, Weiwei
    [J]. NEURAL COMPUTING & APPLICATIONS, 2017, 28 : S513 - S524
  • [7] Statera: A Balanced Feature Selection Method for Text Classification
    Gama Bispo, Braian Varjao
    Rios, Tatiane Nogueira
    [J]. 2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 260 - 265
  • [8] A Hybrid Feature Selection Method For Vietnamese Text Classification
    Nguyen Tri Hai
    Tuan Dinh Le
    Nguyen Hoang Nghia
    Vu Thanh Nguyen
    [J]. 2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 91 - 96
  • [9] A novel probabilistic feature selection method for text classification
    Uysal, Alper Kursat
    Gunal, Serkan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 36 : 226 - 235
  • [10] A parallel feature selection method study for text classification
    Zhao Li
    Wei Lu
    Zhanquan Sun
    Weiwei Xing
    [J]. Neural Computing and Applications, 2017, 28 : 513 - 524