Feature Selection for Text Classification Using Mutual Information

被引:2
|
作者
Sel, Ilhami [1 ]
Karci, Ali [1 ]
Hanbay, Davut [1 ]
机构
[1] Inonu Univ, Bilgisayar Muhendisligi Bolumu, Malatya, Turkey
关键词
Natural Language Processing; Doc2Vec; Mutual Information; Maximum Entropy;
D O I
10.1109/idap.2019.8875927
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature selection can be defined as the selection of the best subset to represent the data set, that is, the removal of unnecessary data that does not affect the result. The efficiency and accuracy of the system can be increased by decreasing the size and the feature selection in classification applications. In this study, text classification was applied by using "20 news group" data published by Reuters news agency. The pre-processed news data were converted into vectors using the Doc2Vec method and a data set was created. This data set is classified by the Maximum Entropy Classification method. Afterwards, a subset of data sets was created by using the Mutual Information Method for the feature selection. Reclassification was performed with the resulting data set and the results were compared according to the performance rates. While the success of the system with 600 features was (0.9285) before the feature selection, (0.9285), then, the performance rates of the 200, 100, 50, 20 models were obtained as (0.9454, 0.9426, 0.9407, 0.9123), respectively. When the results were examined, the success of the 50-featured model was higher than the 600-featured model initially created.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [32] Feature Selection Using Maximum Feature Tree Embedded with Mutual Information and Coefficient of Variation for Bird Sound Classification
    Xu, Haifeng
    Zhang, Yan
    Liu, Jiang
    Lv, Danjv
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [33] Mutual information criterion for feature selection with application to classification of breast microcalcifications
    Diamant, Idit
    Shalhon, Moran
    Goldberger, Jacob
    Greenspan, Hayit
    [J]. MEDICAL IMAGING 2016: IMAGE PROCESSING, 2016, 9784
  • [34] Feature selection, mutual information, and the classification of high-dimensional patterns
    Bonev, Boyan
    Escolano, Francisco
    Cazorla, Miguel
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2008, 11 (3-4) : 309 - 319
  • [35] Mutual Information-Based Feature Selection and Ensemble Learning for Classification
    Qi, Chengming
    Zhou, Zhangbing
    Wang, Qun
    Hu, Lishuan
    [J]. 2016 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI), 2016, : 116 - 121
  • [36] An Improved Feature Selection Algorithm with Conditional Mutual Information for Classification Problems
    Palanichamy, Jaganathan
    Ramasamy, Kuppuchamy
    [J]. 2013 INTERNATIONAL CONFERENCE ON HUMAN COMPUTER INTERACTIONS (ICHCI), 2013,
  • [37] A Fuzzy Mutual Information-based Feature Selection Method for Classification
    Hogue, N.
    Ahmed, H. A.
    Bhattacharyya, D. K.
    Kalita, J. K.
    [J]. FUZZY INFORMATION AND ENGINEERING, 2016, 8 (03) : 355 - 384
  • [38] Feature Selection For Text Classification Using Genetic Algorithms
    Bidi, Noria
    Elberrichi, Zakaria
    [J]. PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION & CONTROL (ICMIC 2016), 2016, : 806 - 810
  • [39] Feature Selection by Using Heuristic Methods for Text Classification
    Sel, Ilhami
    Yeroglu, Celalettin
    Hanbay, Davut
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [40] Nonlinear probit gene classification using mutual information and wavelet-based feature selection
    Zhou, XB
    Wang, XD
    Dougherty, ER
    [J]. JOURNAL OF BIOLOGICAL SYSTEMS, 2004, 12 (03) : 371 - 386