An enhanced algorithm for semantic-based feature reduction in spam filtering

被引:0
|
作者
Novo-Lourés, María [1 ,2 ,3 ]
Pavón, Reyes [1 ,2 ,3 ]
Laza, Rosalía [1 ,2 ,3 ]
Méndez, José R. [1 ,2 ,3 ]
Ruano-Ordás, David [1 ,2 ,3 ]
机构
[1] CINBIO - Biomedical Research Centre, CINBIO, Pontevedra, Vigo, Spain
[2] Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Pontevedra, Vigo, Spain
[3] Department of Computer Science, ESEI - Escola Superior de Enxeñaría Informática, Edificio Politécnico, Universidade de Vigo, Ourense, Ourense, Spain
关键词
Dimensionality reduction;
D O I
10.7717/PEERJ-CS.2206
中图分类号
学科分类号
摘要
With the advent and improvement of ontological dictionaries (WordNet, Babelnet), the use of synsets-based text representations is gaining popularity in classification tasks. More recently, ontological dictionaries were used for reducing dimensionality in this kind of representation (e.g., Semantic Dimensionality Reduction System (SDRS) (Vélez de Mendizabal et al., 2020)). These approaches are based on the combination of semantically related columns by taking advantage of semantic information extracted from ontological dictionaries. Their main advantage is that they not only eliminate features but can also combine them, minimizing (low-loss) or avoiding (lossless) the loss of information. The most recent (and accurate) techniques included in this group are based on using evolutionary algorithms to find how many features can be grouped to reduce false positive (FP) and false negative (FN) errors obtained. The main limitation of these evolutionary-based schemes is the computational requirements derived from the use of optimization algorithms. The contribution of this study is a new lossless feature reduction scheme exploiting information from ontological dictionaries, which achieves slightly better accuracy (specially in FP errors) than optimization-based approaches but using far fewer computational resources. Instead of using computationally expensive evolutionary algorithms, our proposal determines whether two columns (synsets) can be combined by observing whether the instances included in a dataset (e.g., training dataset) containing these synsets are mostly of the same class. The study includes experiments using three datasets and a detailed comparison with two previous optimization-based approaches. © 2024 Novo-Lourés et al.
引用
收藏
相关论文
共 50 条
  • [1] An enhanced algorithm for semantic-based feature reduction in spam filtering
    Novo-Loures, Maria
    Pavon, Reyes
    Laza, Rosalia
    Mendez, Jose R.
    Ruano-Ordas, David
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [2] A new semantic-based feature selection method for spam filtering
    Mendez, Jose R.
    Cotos-Yanez, Tomas R.
    Ruano-Ordas, David
    APPLIED SOFT COMPUTING, 2019, 76 : 89 - 104
  • [3] A semantic-based classification approach for an enhanced spam detection
    Saidani, Nadjate
    Adi, Kamel
    Allili, Mohand Said
    COMPUTERS & SECURITY, 2020, 94
  • [4] A semantic-based model with a hybrid feature engineering process for accurate spam detection
    Chira N. Mohammed
    Ayah M. Ahmed
    Journal of Electrical Systems and Information Technology, 11 (1)
  • [5] Semantic-Based Feature Reduction Approach for E-mail Classification
    Bahgat, Eman M.
    Moawad, Ibrahim F.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 53 - 63
  • [6] An Enhanced Semantic-based Cache Replacement Algorithm for Web Systems
    Xuan Tung Hoang
    Ngoc Dung Bui
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 142 - 147
  • [7] Spam filtering based on latent semantic indexing
    Gansterer, Wilfried N.
    Janecek, Andreas G. K.
    Neumayer, Robert
    SURVEY OF TEXT MINING II: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2008, : 165 - +
  • [8] A new feature selection algorithm based on binomial hypothesis testing for spam filtering
    Yang, Jieming
    Liu, Yuanning
    Liu, Zhen
    Zhu, Xiaodong
    Zhang, Xiaoxu
    KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) : 904 - 914
  • [9] A Semantic-based Algorithm for Microblogs Clustering
    Miao, Jiajia
    Chen, Guoyou
    Wang, Le
    Fang, Xuelin
    ADVANCES IN MECHATRONICS AND CONTROL ENGINEERING, PTS 1-3, 2013, 278-280 : 1174 - +
  • [10] Deep semantic-Based Feature Envy Identification
    Guo, Xueliang
    Shi, Chongyang
    Jiang, He
    11TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE (INTERNETWARE 2019), 2019,