A high-quality feature selection method based on frequent and correlated items for text classification

被引:19
|
作者
Farghaly, Heba Mamdouh [1 ]
Abd El-Hafeez, Tarek [1 ,2 ]
机构
[1] Minia Univ, Fac Sci, Dept Comp Sci, El Minia, Egypt
[2] Deraya Univ, Comp Sci Unit, El Minia, Egypt
关键词
Feature selection; Dimensionality reduction; Text classification; Association rule mining; Feature interaction;
D O I
10.1007/s00500-023-08587-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature selection problem is a significant challenge in pattern recognition, especially for classification tasks. The quality of the selected features plays a critical role in building effective models, and poor-quality data can make this process more difficult. This work explores the use of association analysis in data mining to select meaningful features, addressing the issue of duplicated information in the selected features. A novel feature selection technique for text classification is proposed, based on frequent and correlated items. This method considers both relevance and feature interactions, using association as a metric to evaluate the relationship between the target and features. The technique was tested using the SMS spam collecting dataset from the UCI machine learning repository and compared with well-known feature selection methods. The results showed that the proposed technique effectively reduced redundant information while achieving high accuracy (95.155%) using only 6% of the features.
引用
收藏
页码:11259 / 11274
页数:16
相关论文
共 50 条
  • [31] A hybrid method of feature selection for Chinese text sentiment classification
    Wang, Suge
    Wei, Yingjie
    Li, Deyu
    Zhang, Wu
    Li, Wei
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 435 - +
  • [32] A feature selection method to handle imbalanced data in text classification
    Chang, Fengxiang
    Guo, Jun
    Xu, Weiran
    Yao, Kejun
    [J]. Journal of Digital Information Management, 2015, 13 (03): : 169 - 175
  • [33] Research on Feature Selection Method in Chinese Text Automatic Classification
    Hong, Ying
    Shao, Xiwen
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 1759 - 1763
  • [34] Two-stage Feature Selection Method for Text Classification
    Li Xi
    Dai Hang
    Wang Mingwen
    [J]. MINES 2009: FIRST INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY, VOL 1, PROCEEDINGS, 2009, : 234 - +
  • [35] Research on feature selection method in Chinese text automatic classification
    Hong, Ying
    Geng, Zengmin
    [J]. ENERGY SCIENCE AND APPLIED TECHNOLOGY, 2016, : 359 - 361
  • [36] A novel filter feature selection method for text classification: Extensive Feature Selector
    Parlak, Bekir
    Uysal, Alper Kursat
    [J]. JOURNAL OF INFORMATION SCIENCE, 2023, 49 (01) : 59 - 78
  • [37] A NEW FEATURE SELECTION METHOD BASED ON CONCEPT EXTRACTION IN AUTOMATIC CHINESE TEXT CLASSIFICATION
    Liao, Shasha
    Jiang, Minghu
    [J]. NEW MATHEMATICS AND NATURAL COMPUTATION, 2007, 3 (03) : 331 - 347
  • [38] A method of the feature selection in hierarchical text classification based on the category discrimination and position information
    Song, Jia
    Zhang, Pengzhou
    Qin, Sijun
    Gong, Junpeng
    [J]. 2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2015, : 132 - +
  • [39] A Feature Selection Method Based on Fisher's Discriminant Ratio for Text Sentiment Classification
    Wang, Suge
    Li, Deyu
    Wei, Yingjie
    Li, Hongxia
    [J]. WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 88 - +
  • [40] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675