An enhanced algorithm for semantic-based feature reduction in spam filtering

被引:0
|
作者
Novo-Lourés, María [1 ,2 ,3 ]
Pavón, Reyes [1 ,2 ,3 ]
Laza, Rosalía [1 ,2 ,3 ]
Méndez, José R. [1 ,2 ,3 ]
Ruano-Ordás, David [1 ,2 ,3 ]
机构
[1] CINBIO - Biomedical Research Centre, CINBIO, Pontevedra, Vigo, Spain
[2] Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Pontevedra, Vigo, Spain
[3] Department of Computer Science, ESEI - Escola Superior de Enxeñaría Informática, Edificio Politécnico, Universidade de Vigo, Ourense, Ourense, Spain
关键词
Dimensionality reduction;
D O I
10.7717/PEERJ-CS.2206
中图分类号
学科分类号
摘要
With the advent and improvement of ontological dictionaries (WordNet, Babelnet), the use of synsets-based text representations is gaining popularity in classification tasks. More recently, ontological dictionaries were used for reducing dimensionality in this kind of representation (e.g., Semantic Dimensionality Reduction System (SDRS) (Vélez de Mendizabal et al., 2020)). These approaches are based on the combination of semantically related columns by taking advantage of semantic information extracted from ontological dictionaries. Their main advantage is that they not only eliminate features but can also combine them, minimizing (low-loss) or avoiding (lossless) the loss of information. The most recent (and accurate) techniques included in this group are based on using evolutionary algorithms to find how many features can be grouped to reduce false positive (FP) and false negative (FN) errors obtained. The main limitation of these evolutionary-based schemes is the computational requirements derived from the use of optimization algorithms. The contribution of this study is a new lossless feature reduction scheme exploiting information from ontological dictionaries, which achieves slightly better accuracy (specially in FP errors) than optimization-based approaches but using far fewer computational resources. Instead of using computationally expensive evolutionary algorithms, our proposal determines whether two columns (synsets) can be combined by observing whether the instances included in a dataset (e.g., training dataset) containing these synsets are mostly of the same class. The study includes experiments using three datasets and a detailed comparison with two previous optimization-based approaches. © 2024 Novo-Lourés et al.
引用
收藏
相关论文
共 50 条
  • [31] Facial feature model fitting in semantic-based scene analysis
    Antoszczyszyn, PM
    Hannah, JM
    Grant, PM
    ELECTRONICS LETTERS, 1997, 33 (10) : 855 - 857
  • [32] Applications of Text Clustering Based on Semantic Body for Chinese Spam Filtering
    Zhang, Qiu-Yu
    Wang, Peng
    Yang, Hui-Juan
    JOURNAL OF COMPUTERS, 2012, 7 (11) : 2612 - 2616
  • [33] Semantic spam filtering from personalized ontologies
    Eyharabide, Victoria
    Amandi, Analia
    JOURNAL OF WEB ENGINEERING, 2008, 7 (02): : 158 - 176
  • [34] Spam Filtering by Semantic Indexing and Duplicate Detection
    Sonia
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 833 - 836
  • [35] Semantic-Based Mappings
    Mecca, Giansalvatore
    Rull, Guillem
    Santoro, Donatello
    Teniente, Ernest
    CONCEPTUAL MODELING, ER 2013, 2013, 8217 : 255 - +
  • [36] Semantic-based bandwidth reduction in wide area training networks
    Bassiouni, MA
    Chin, MH
    SECOND IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, PROCEEDINGS, 1997, : 292 - 296
  • [37] A Local-Concentration-Based Feature Extraction Approach for Spam Filtering
    Zhu, Yuanchun
    Tan, Ying
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2011, 6 (02) : 486 - 497
  • [38] Feature Selection and Similarity Coefficient Based Method for Email Spam Filtering
    Abdelrahim, Ali Ahmed A.
    Elhadi, Ammar Ahmed E.
    Ibrahim, Hamza
    Elmisbah, Naser
    2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONICS ENGINEERING (ICCEEE), 2013, : 630 - 633
  • [39] A Semantic-based Recommender System Using A Simulated Annealing Algorithm
    Picot-Clemente, Romain
    Cruz, Christophe
    Nicolle, Christophe
    SEMAPRO 2010: THE FOURTH INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, 2010, : 132 - 137
  • [40] A Semantic-Based Hoist Mutation Operator for Evolutionary Feature Construction in Regression
    Zhang, Hengzhe
    Chen, Qi
    Xue, Bing
    Banzhaf, Wolfgang
    Zhang, Mengjie
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2024, 28 (06) : 1689 - 1703