A high-quality feature selection method based on frequent and correlated items for text classification

被引:19
|
作者
Farghaly, Heba Mamdouh [1 ]
Abd El-Hafeez, Tarek [1 ,2 ]
机构
[1] Minia Univ, Fac Sci, Dept Comp Sci, El Minia, Egypt
[2] Deraya Univ, Comp Sci Unit, El Minia, Egypt
关键词
Feature selection; Dimensionality reduction; Text classification; Association rule mining; Feature interaction;
D O I
10.1007/s00500-023-08587-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature selection problem is a significant challenge in pattern recognition, especially for classification tasks. The quality of the selected features plays a critical role in building effective models, and poor-quality data can make this process more difficult. This work explores the use of association analysis in data mining to select meaningful features, addressing the issue of duplicated information in the selected features. A novel feature selection technique for text classification is proposed, based on frequent and correlated items. This method considers both relevance and feature interactions, using association as a metric to evaluate the relationship between the target and features. The technique was tested using the SMS spam collecting dataset from the UCI machine learning repository and compared with well-known feature selection methods. The results showed that the proposed technique effectively reduced redundant information while achieving high accuracy (95.155%) using only 6% of the features.
引用
收藏
页码:11259 / 11274
页数:16
相关论文
共 50 条
  • [1] A high-quality feature selection method based on frequent and correlated items for text classification
    Heba Mamdouh Farghaly
    Tarek Abd El-Hafeez
    [J]. Soft Computing, 2023, 27 : 11259 - 11274
  • [2] A new feature selection method based on frequent and associated itemsets for text classification
    Farghaly, Heba Mamdouh
    Abd El-Hafeez, Tarek
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (25):
  • [3] Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez-Franco, Edgar
    Davahli, Mohammad Reza
    Wilamowski, Maciej
    Ahram, Tareq
    Al-Juaid, Awad
    Zurada, Jozef
    [J]. IEEE ACCESS, 2021, 9 (09): : 105439 - 105450
  • [4] Interactive high-quality text classification
    Liu, Rey-Long
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (03) : 1062 - 1075
  • [5] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [6] Text feature selection method for hierarchical classification
    Zhu, Cui-Ling
    Ma, Jun
    Zhang, Dong-Mei
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 103 - 110
  • [7] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [8] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [9] Feature Selection Method of Text Tendency Classification
    Li, Yanling
    Dai, Guanzhong
    Li, Gang
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
  • [10] A feature selection method based on synonym merging in text classification system
    Haipeng Yao
    Chong Liu
    Peiying Zhang
    Luyao Wang
    [J]. EURASIP Journal on Wireless Communications and Networking, 2017