A new feature selection method based on frequent and associated itemsets for text classification

被引:16
|
作者
Farghaly, Heba Mamdouh [1 ]
Abd El-Hafeez, Tarek [1 ,2 ]
机构
[1] Minia Univ, Fac Sci, Dept Comp Sci, El Minia, Egypt
[2] Deraya Univ, Comp Sci Unit, El Minia, Egypt
来源
关键词
association rule mining; dimensionality reduction; feature interaction; feature selection; text classification;
D O I
10.1002/cpe.7258
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Feature selection is one of the major issues in pattern recognition. The quality of selected features is important for classification as the low-quality data can degrade the model construction performance. Due to the difficulty of dealing with the problem that selected features always contain redundant information, this article focuses on the association analysis theory in data mining to select important features. In this study, a novel feature selection method based on frequent and associated itemsets (FS-FAI) for text classification is proposed. FS-FAI seeks to find relevant features and also takes feature interaction into account. Moreover, it uses association as a metric to evaluate the relativity between the target concept and feature(s). To evaluate the efficacy of the proposed method, several experiments were conducted on a BBC dataset from the BBC news website and SMS spam collection dataset from the UCI machine learning repository. The obtained results were compared to well-known feature selection methods. The reported results demonstrated the effectiveness of the proposed feature selection method in selecting high-quality features and in handling redundant information in text classification.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [2] A high-quality feature selection method based on frequent and correlated items for text classification
    Heba Mamdouh Farghaly
    Tarek Abd El-Hafeez
    [J]. Soft Computing, 2023, 27 : 11259 - 11274
  • [3] A high-quality feature selection method based on frequent and correlated items for text classification
    Farghaly, Heba Mamdouh
    Abd El-Hafeez, Tarek
    [J]. SOFT COMPUTING, 2023, 27 (16) : 11259 - 11274
  • [4] A New Feature Selection Method for Text Classification Based on Independent Feature Space Search
    Liu, Yong
    Ju, Shenggen
    Wang, Junfeng
    Su, Chong
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [5] Text classification using sentential frequent itemsets
    Liu, Shi-Zhu
    Hu, He-Ping
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2007, 22 (02): : 334 - 337
  • [6] Text Classification Using Sentential Frequent Itemsets
    Shi-Zhu Liu
    He-Ping Hu
    [J]. Journal of Computer Science and Technology, 2007, 22 : 334 - 337
  • [7] A NEW FEATURE SELECTION METHOD BASED ON CONCEPT EXTRACTION IN AUTOMATIC CHINESE TEXT CLASSIFICATION
    Liao, Shasha
    Jiang, Minghu
    [J]. NEW MATHEMATICS AND NATURAL COMPUTATION, 2007, 3 (03) : 331 - 347
  • [8] A new classification of datasets for frequent itemsets
    Frédéric Flouvat
    Fabien De Marchi
    Jean-Marc Petit
    [J]. Journal of Intelligent Information Systems, 2010, 34 : 1 - 19
  • [9] A new classification of datasets for frequent itemsets
    Flouvat, Frederic
    De Marchi, Fabien
    Petit, Jean-Marc
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2010, 34 (01) : 1 - 19
  • [10] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,