A high-quality feature selection method based on frequent and correlated items for text classification

被引:19
|
作者
Farghaly, Heba Mamdouh [1 ]
Abd El-Hafeez, Tarek [1 ,2 ]
机构
[1] Minia Univ, Fac Sci, Dept Comp Sci, El Minia, Egypt
[2] Deraya Univ, Comp Sci Unit, El Minia, Egypt
关键词
Feature selection; Dimensionality reduction; Text classification; Association rule mining; Feature interaction;
D O I
10.1007/s00500-023-08587-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature selection problem is a significant challenge in pattern recognition, especially for classification tasks. The quality of the selected features plays a critical role in building effective models, and poor-quality data can make this process more difficult. This work explores the use of association analysis in data mining to select meaningful features, addressing the issue of duplicated information in the selected features. A novel feature selection technique for text classification is proposed, based on frequent and correlated items. This method considers both relevance and feature interactions, using association as a metric to evaluate the relationship between the target and features. The technique was tested using the SMS spam collecting dataset from the UCI machine learning repository and compared with well-known feature selection methods. The results showed that the proposed technique effectively reduced redundant information while achieving high accuracy (95.155%) using only 6% of the features.
引用
收藏
页码:11259 / 11274
页数:16
相关论文
共 50 条
  • [21] A parallel feature selection method study for text classification
    Zhao Li
    Wei Lu
    Zhanquan Sun
    Weiwei Xing
    [J]. Neural Computing and Applications, 2017, 28 : 513 - 524
  • [22] A novel probabilistic feature selection method for text classification
    Uysal, Alper Kursat
    Gunal, Serkan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 36 : 226 - 235
  • [23] Hybrid Support Vector Machine based Feature Selection Method for Text Classification
    Sabbah, Thabit
    Ayyash, Mosab
    Ashraf, Mahmood
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (3A) : 599 - 609
  • [24] A Chi-square Statistics Based Feature Selection Method in Text Classification
    Zhai, Yujia
    Song, Wei
    Liu, Xianjun
    Liu, Lizhen
    Zhao, Xinlei
    [J]. PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 160 - 163
  • [25] Feature Selection Method Based On Statistics of Compound Words for Arabic Text Classification
    Adel, Aisha
    Omar, Nazlia
    Albared, Mohammed
    Al-Shabi, Adel
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (02) : 178 - 185
  • [26] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [27] Research on Feature Selection and kNN Classification Method in Chinese Text Classification
    Xiao Chao
    Wu Ping
    [J]. PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 956 - 962
  • [28] Utility-based feature selection for text classification
    Heyong Wang
    Ming Hong
    Raymond Yiu Keung Lau
    [J]. Knowledge and Information Systems, 2019, 61 : 197 - 226
  • [29] Utility-based feature selection for text classification
    Wang, Heyong
    Hong, Ming
    Lau, Raymond Yiu Keung
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 197 - 226
  • [30] Text classification based on feature selection and LDA model
    [J]. Zheng, C. (csahu@126.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):