Feature Selection for Text Classification Using Mutual Information

被引：2

作者：

Sel, Ilhami ^{[1
]}

Karci, Ali ^{[1
]}

Hanbay, Davut ^{[1
]}

机构：

[1] Inonu Univ, Bilgisayar Muhendisligi Bolumu, Malatya, Turkey

来源：

2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019) | 2019年

关键词：

Natural Language Processing; Doc2Vec; Mutual Information; Maximum Entropy;

D O I：

10.1109/idap.2019.8875927

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The feature selection can be defined as the selection of the best subset to represent the data set, that is, the removal of unnecessary data that does not affect the result. The efficiency and accuracy of the system can be increased by decreasing the size and the feature selection in classification applications. In this study, text classification was applied by using "20 news group" data published by Reuters news agency. The pre-processed news data were converted into vectors using the Doc2Vec method and a data set was created. This data set is classified by the Maximum Entropy Classification method. Afterwards, a subset of data sets was created by using the Mutual Information Method for the feature selection. Reclassification was performed with the resulting data set and the results were compared according to the performance rates. While the success of the system with 600 features was (0.9285) before the feature selection, (0.9285), then, the performance rates of the 200, 100, 50, 20 models were obtained as (0.9454, 0.9426, 0.9407, 0.9123), respectively. When the results were examined, the success of the 50-featured model was higher than the 600-featured model initially created.

引用

页数：4

共 50 条

[1] Feature selection using improved mutual information for text classification
Novovicová, J
Malík, A
Pudil, P
[J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 1010 - 1017
[2] Feature selection algorithm for text classification based on improved mutual information
丛帅
张积宾
徐志明
王宇颖
[J]. Journal of Harbin Institute of Technology(New series), 2011, (03) : 144 - 148
[3] Mutual Information Using Sample Variance for Text Feature Selection
Agnihotri, Deepak
Verma, Kesari
Tripathi, Priyanka
[J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION PROCESSING (ICCIP 2017), 2017, : 39 - 44
[4] Discriminant Mutual Information for Text Feature Selection
Wang, Jiaqi
Zhang, Li
[J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 136 - 151
[5] Modified Pointwise Mutual Information-Based Feature Selection for Text Classification
Georgieva-Trifonova, Tsvetanka
[J]. PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2021, VOL 2, 2022, 359 : 333 - 353
[6] Improved Mutual Information Method For Text Feature Selection
Ding Xiaoming
Tang Yan
[J]. PROCEEDINGS OF THE 2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2013), 2013, : 163 - 166
[7] Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual information
Lazhar, Farek
Amira, Benaidja
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024,
[8] Feature selection using mutual information based uncertainty measures for tumor classification
Sun, Lin
Xu, Jiucheng
[J]. BIO-MEDICAL MATERIALS AND ENGINEERING, 2014, 24 (01) : 763 - 770
[9] Feature selection for multi-label classification using multivariate mutual information
Lee, Jaesung
Kim, Dae-Won
[J]. PATTERN RECOGNITION LETTERS, 2013, 34 (03) : 349 - 357
[10] Conditional mutual information based feature selection for classification task
Novovicova, Jana
Somol, Petr
Haindl, Michal
Pudil, Pavel
[J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2007, 4756 : 417 - 426

← 1 2 3 4 5 →