Estimator learning automata for feature subset selection in high-dimensional spaces, case study: Email spam detection

被引:7
|
作者
Seyyedi, Seyyed Hossein [1 ]
Minaei-Bidgoli, Behrouz [2 ]
机构
[1] Islamic Azad Univ, Kashan Branch, Dept Comp Engn, Kashan, Iran
[2] Iran Univ Sci & Technol, Sch Comp Engn, Tehran, Iran
关键词
data mining; dimension reduction; estimator learning automata; high-dimensional space; spam detection; text classification; OPTIMIZATION; ALGORITHM; IDENTIFICATION;
D O I
10.1002/dac.3541
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
One of the difficult challenges facing data miners is that algorithm performance degrades if the feature space contains redundant or irrelevant features. Therefore, as a critical preprocess task, dimension reduction is used to build a smaller space containing valuable features. There are 2 different approaches for dimension reduction: feature extraction and feature selection, which itself is divided into wrapper and filter approaches. In high-dimensional spaces, feature extraction and wrapper approaches are not applicable due to the time complexity. On the other hand, the filter approach suffers from inaccuracy. One main reason for this inaccuracy is that the subset's size is not determined considering specifications of the problem. In this paper, we propose ESS (estimator learning automaton-based subset selection) as a new method for feature selection in high-dimensional spaces. The innovation of ESS is that it combines wrapper and filter ideas and uses estimator learning automata to efficiently determine a feature subset that leads to a desirable tradeoff between the accuracy and efficiency of the learning algorithm. To find a qualified subset for a special processing algorithm that functions on an arbitrary dataset, ESS uses an automaton to score each candidate subset upon the scale of the subset and accuracy of the learning algorithm using it. In the end, the subset with the highest score is returned. We have used ESS for feature selection in the framework of spam detection, a text classification task for email as a pervasive communication medium. The results show achievement in reaching the goal stated above.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Using learning automata to determine proper subset size in high-dimensional spaces
    Seyyedi, Seyyed Hossein
    Minaei-Bidgoli, Behrouz
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2017, 29 (02) : 415 - 432
  • [2] Stability of feature selection algorithms: a study on high-dimensional spaces
    Kalousis, Alexandros
    Prados, Julien
    Hilario, Melanie
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 95 - 116
  • [3] Stability of feature selection algorithms: a study on high-dimensional spaces
    Alexandros Kalousis
    Julien Prados
    Melanie Hilario
    [J]. Knowledge and Information Systems, 2007, 12 : 95 - 116
  • [4] A multi-agent system based for solving high-dimensional optimization problems: A case study on email spam detection
    Mohammadzadeh, Hekmat
    Gharehchopogh, Farhad Soleimanian
    [J]. INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2021, 34 (03)
  • [5] Feature Extraction for Outlier Detection in High-Dimensional Spaces
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON FEATURE SELECTION IN DATA MINING, 2010, 10 : 66 - 75
  • [6] A Feature Subset Selection Method Based On High-Dimensional Mutual Information
    Zheng, Yun
    Kwoh, Chee Keong
    [J]. ENTROPY, 2011, 13 (04) : 860 - 901
  • [7] Email spam detection by deep learning models using novel feature selection technique and BERT
    Nasreen, Ghazala
    Khan, Muhammad Murad
    Younus, Muhammad
    Zafar, Bushra
    Hanif, Muhammad Kashif
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2024, 26
  • [8] SUBMODULAR FEATURE SELECTION FOR HIGH-DIMENSIONAL ACOUSTIC SCORE SPACES
    Liu, Yuzong
    Wei, Kai
    Kirchhoff, Katrin
    Song, Yisong
    Bilmes, Jeff
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7184 - 7188
  • [9] The Assessment of Feature Selection Methods on Agglutinative Language for Spam Email Detection: A Special Case for Turkish
    Ergin, Semih
    Isik, Sahin
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA 2014), 2014, : 122 - 125
  • [10] Efficient Learning and Feature Selection in High-Dimensional Regression
    Ting, Jo-Anne
    D'Souza, Aaron
    Vijayakumar, Sethu
    Schaal, Stefan
    [J]. NEURAL COMPUTATION, 2010, 22 (04) : 831 - 886