Feature Selection for Text Classification Using Mutual Information

被引:2
|
作者
Sel, Ilhami [1 ]
Karci, Ali [1 ]
Hanbay, Davut [1 ]
机构
[1] Inonu Univ, Bilgisayar Muhendisligi Bolumu, Malatya, Turkey
关键词
Natural Language Processing; Doc2Vec; Mutual Information; Maximum Entropy;
D O I
10.1109/idap.2019.8875927
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature selection can be defined as the selection of the best subset to represent the data set, that is, the removal of unnecessary data that does not affect the result. The efficiency and accuracy of the system can be increased by decreasing the size and the feature selection in classification applications. In this study, text classification was applied by using "20 news group" data published by Reuters news agency. The pre-processed news data were converted into vectors using the Doc2Vec method and a data set was created. This data set is classified by the Maximum Entropy Classification method. Afterwards, a subset of data sets was created by using the Mutual Information Method for the feature selection. Reclassification was performed with the resulting data set and the results were compared according to the performance rates. While the success of the system with 600 features was (0.9285) before the feature selection, (0.9285), then, the performance rates of the 200, 100, 50, 20 models were obtained as (0.9454, 0.9426, 0.9407, 0.9123), respectively. When the results were examined, the success of the 50-featured model was higher than the 600-featured model initially created.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Multilabel Feature Selection Using Mutual Information and ML-ReliefF for Multilabel Classification
    Shi, Enhui
    Sun, Lin
    Xu, Jiucheng
    Zhang, Shiguang
    [J]. IEEE ACCESS, 2020, 8 : 145381 - 145400
  • [22] Using The Maximum Mutual Information Criterion To Textural Feature Selection For Satellite Image Classification
    Kerroum, Mounir Ait
    Hammouch, Ahmed
    Aboutajdine, Driss
    Bellaachia, Abdelghani
    [J]. 2008 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1-3, 2008, : 584 - +
  • [23] Information-theoretic feature selection algorithms for text classification
    Novovicová, J
    Malík, A
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 3272 - 3277
  • [24] Feature selection using mutual information in CT colonography
    Ong, Ju Lynn
    Seghouane, Abd-Krim
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 337 - 341
  • [25] Feature selection using a mutual information based measure
    Al-Ani, A
    Deriche, M
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 82 - 85
  • [26] Using Mutual Information for Feature Selection in Programmatic Advertising
    Ciesielczyk, Michal
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 290 - 295
  • [27] Feature selection using Joint Mutual Information Maximisation
    Bennasar, Mohamed
    Hicks, Yulia
    Setchi, Rossitza
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) : 8520 - 8532
  • [28] Effective feature selection scheme using mutual information
    Huang, D
    Chow, TWS
    [J]. NEUROCOMPUTING, 2005, 63 : 325 - 343
  • [29] AMIFS:: Adaptive feature selection by using mutual information
    Tesmer, M
    Estévez, PA
    [J]. 2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 303 - 308
  • [30] Feature selection using Decomposed Mutual Information Maximization
    Macedo, Francisco
    Valadas, Rui
    Carrasquinha, Eunice
    Oliveira, M. Rosario
    Pacheco, Antonio
    [J]. NEUROCOMPUTING, 2022, 513 : 215 - 232