An enhanced feature selection method for text classification

被引:0
|
作者
Kang, Jinbeom [1 ]
Lee, Eunshil [1 ]
Hong, Kwanghee [1 ]
Park, Jeahyun [1 ]
Kim, Taehwan [1 ]
Park, Juyoung [1 ]
Choi, Joongmin [1 ]
Yang, Jaeyoung [1 ]
机构
[1] Hanyang Univ, Dept Comp Sci & Engn, Ansan, Kunngi Do, South Korea
关键词
feature selection; impurity of words; unbalanced distribution; machine learning; text classification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection in machine learning is a task of identifying a set of representative terms or features from a document collection that are mainly used in text classification. Existing feature selection methods including information gain and X chi(2)-test focus on those features that are useful for all topics, and consequently lack the power of selecting those features that are truly the representatives of a particular topic (or class). Also, these methods assume that the distribution of documents for each class is balanced. However, this assumption affects negatively to the classification accuracy because real-world document collections rarely have a balanced distribution, and also it is difficult to prepare a set of training documents with even number of documents for each class. To resolve this problem, we propose a new feature selection method for text classification that focuses on the purity of a word that emphasizes its representativeness for a particular class. Also our method assumes unbalanced distribution of documents over multiple classes, and combines feature values with the weight factors that,reflect the number of training documents in each class. In summary, we can obtain feature candidates using the word purity and then select the features with the unbalanced distribution of documents. Via some experiments, we demonstrate that our method outperforms existing methods.
引用
收藏
页码:36 / 41
页数:6
相关论文
共 50 条
  • [21] Hybrid feature selection for text classification
    Gunal, Serkan
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [22] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [23] Feature Selection Strategy in Text Classification
    Fung, Pui Cheong Gabriel
    Morstatter, Fred
    Liu, Huan
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
  • [24] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [25] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    [J]. Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [26] Feature Selection for Ordinal Text Classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591
  • [27] An Enhanced Feature Selection for Text Documents
    Thatha, Venkata Nagaraju
    Babu, A. Sudhir
    Haritha, D.
    [J]. SMART INTELLIGENT COMPUTING AND APPLICATIONS, VOL 2, 2020, 160 : 21 - 29
  • [28] A New Feature Selection Method for Text Classification Based on Independent Feature Space Search
    Liu, Yong
    Ju, Shenggen
    Wang, Junfeng
    Su, Chong
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [29] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268
  • [30] A CLASS SPECIFIC FEATURE SELECTION METHOD FOR IMPROVING THE PERFORMANCE OF TEXT CLASSIFICATION
    Venkatesh, V.
    Sharan, S. B.
    Mahalaxmy, S.
    Monisha, S.
    Sanjey, Ashick D. S.
    Ashokkumar, P.
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (02): : 1018 - 1028