An enhanced algorithm for semantic-based feature reduction in spam filtering

被引:0
|
作者
Novo-Lourés, María [1 ,2 ,3 ]
Pavón, Reyes [1 ,2 ,3 ]
Laza, Rosalía [1 ,2 ,3 ]
Méndez, José R. [1 ,2 ,3 ]
Ruano-Ordás, David [1 ,2 ,3 ]
机构
[1] CINBIO - Biomedical Research Centre, CINBIO, Pontevedra, Vigo, Spain
[2] Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Pontevedra, Vigo, Spain
[3] Department of Computer Science, ESEI - Escola Superior de Enxeñaría Informática, Edificio Politécnico, Universidade de Vigo, Ourense, Ourense, Spain
关键词
Dimensionality reduction;
D O I
10.7717/PEERJ-CS.2206
中图分类号
学科分类号
摘要
With the advent and improvement of ontological dictionaries (WordNet, Babelnet), the use of synsets-based text representations is gaining popularity in classification tasks. More recently, ontological dictionaries were used for reducing dimensionality in this kind of representation (e.g., Semantic Dimensionality Reduction System (SDRS) (Vélez de Mendizabal et al., 2020)). These approaches are based on the combination of semantically related columns by taking advantage of semantic information extracted from ontological dictionaries. Their main advantage is that they not only eliminate features but can also combine them, minimizing (low-loss) or avoiding (lossless) the loss of information. The most recent (and accurate) techniques included in this group are based on using evolutionary algorithms to find how many features can be grouped to reduce false positive (FP) and false negative (FN) errors obtained. The main limitation of these evolutionary-based schemes is the computational requirements derived from the use of optimization algorithms. The contribution of this study is a new lossless feature reduction scheme exploiting information from ontological dictionaries, which achieves slightly better accuracy (specially in FP errors) than optimization-based approaches but using far fewer computational resources. Instead of using computationally expensive evolutionary algorithms, our proposal determines whether two columns (synsets) can be combined by observing whether the instances included in a dataset (e.g., training dataset) containing these synsets are mostly of the same class. The study includes experiments using three datasets and a detailed comparison with two previous optimization-based approaches. © 2024 Novo-Lourés et al.
引用
收藏
相关论文
共 50 条
  • [41] Efficient Feature Set for Spam Email Filtering
    Varghese, Reshma
    Dhanya, K. A.
    2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 732 - 737
  • [42] Unsupervised feature learning for spam email filtering
    Diale, Melvin
    Celik, Turgay
    Van Der Walt, Christiaan
    COMPUTERS & ELECTRICAL ENGINEERING, 2019, 74 : 89 - 104
  • [43] Semantic-Based Implicit Feature Transform for Few-Shot Classification
    Pan, Mei-Hong
    Xin, Hong-Yi
    Shen, Hong-Bin
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5014 - 5029
  • [44] An Ontology Mapping Algorithm for Rapid Semantic-based Information Integration
    Wang Hai-long
    2009 INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORK AND MULTIMEDIA TECHNOLOGY (CNMT 2009), VOLUMES 1 AND 2, 2009, : 970 - 973
  • [45] A Hybrid Semantic-Based Battlefield Information Services Matching Algorithm
    Wang, Jue
    Zhao, Wendong
    Tian, Chang
    Zhao, Minglei
    PROCEEDINGS OF 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, 2012, : 560 - 566
  • [46] Spam mail filtering system using semantic enrichment
    Kim, HJ
    Kim, HN
    Jung, JJ
    Jo, GS
    WEB INFORMATION SYSTEMS - WISE 2004, PROCEEDINGS, 2004, 3306 : 619 - 628
  • [47] Spam Feature Selection Based on the Improved Mutual Information Algorithm
    Liang Ting
    Yu Qingsong
    2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY (MINES 2012), 2012, : 67 - 70
  • [48] Information filtering algorithm based on semantic understanding
    Zhang B.
    Xiang Y.
    Wang J.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2010, 32 (10): : 2324 - 2330
  • [49] A Multi-Resolution-Concentration Based Feature Construction Approach for Spam Filtering
    Mi, Guyue
    Zhang, Pengtao
    Tan, Ying
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [50] Two-step based hybrid feature selection method for spam filtering
    Wang, Youwei
    Liu, Yuanning
    Zhu, Xiaodong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 27 (06) : 2785 - 2796