An enhanced algorithm for semantic-based feature reduction in spam filtering

被引:0
|
作者
Novo-Lourés, María [1 ,2 ,3 ]
Pavón, Reyes [1 ,2 ,3 ]
Laza, Rosalía [1 ,2 ,3 ]
Méndez, José R. [1 ,2 ,3 ]
Ruano-Ordás, David [1 ,2 ,3 ]
机构
[1] CINBIO - Biomedical Research Centre, CINBIO, Pontevedra, Vigo, Spain
[2] Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Pontevedra, Vigo, Spain
[3] Department of Computer Science, ESEI - Escola Superior de Enxeñaría Informática, Edificio Politécnico, Universidade de Vigo, Ourense, Ourense, Spain
关键词
Dimensionality reduction;
D O I
10.7717/PEERJ-CS.2206
中图分类号
学科分类号
摘要
With the advent and improvement of ontological dictionaries (WordNet, Babelnet), the use of synsets-based text representations is gaining popularity in classification tasks. More recently, ontological dictionaries were used for reducing dimensionality in this kind of representation (e.g., Semantic Dimensionality Reduction System (SDRS) (Vélez de Mendizabal et al., 2020)). These approaches are based on the combination of semantically related columns by taking advantage of semantic information extracted from ontological dictionaries. Their main advantage is that they not only eliminate features but can also combine them, minimizing (low-loss) or avoiding (lossless) the loss of information. The most recent (and accurate) techniques included in this group are based on using evolutionary algorithms to find how many features can be grouped to reduce false positive (FP) and false negative (FN) errors obtained. The main limitation of these evolutionary-based schemes is the computational requirements derived from the use of optimization algorithms. The contribution of this study is a new lossless feature reduction scheme exploiting information from ontological dictionaries, which achieves slightly better accuracy (specially in FP errors) than optimization-based approaches but using far fewer computational resources. Instead of using computationally expensive evolutionary algorithms, our proposal determines whether two columns (synsets) can be combined by observing whether the instances included in a dataset (e.g., training dataset) containing these synsets are mostly of the same class. The study includes experiments using three datasets and a detailed comparison with two previous optimization-based approaches. © 2024 Novo-Lourés et al.
引用
收藏
相关论文
共 50 条
  • [21] Spam Filtering Based on Improved CHI Feature Selection Method
    Lu, Zhimao
    Yu, Hongxia
    Fan, Dongmei
    Yuan, Chaoyue
    PROCEEDINGS OF THE 2009 CHINESE CONFERENCE ON PATTERN RECOGNITION AND THE FIRST CJK JOINT WORKSHOP ON PATTERN RECOGNITION, VOLS 1 AND 2, 2009, : 771 - 773
  • [22] Discriminative Deep Feature Learning for Semantic-Based Image Retrieval
    Song, Kaikai
    Li, Feng
    Long, Fei
    Wang, Junping
    Ling, Qiang
    IEEE ACCESS, 2018, 6 : 44268 - 44280
  • [23] Spam filtering algorithm based on AIS and Bayes network
    Ye, Jixiang
    Tan, Guanzheng
    Jisuanji Gongcheng/Computer Engineering, 2006, 32 (11): : 26 - 28
  • [24] A Semantic-Based Algorithm for Data Dissemination in Opportunistic Networks
    Conti, Marco
    Mordacchini, Matteo
    Passarella, Andrea
    Rozanova, Liudmila
    SELF-ORGANIZING SYSTEMS: 7TH IFIP TC 6 INTERNATIONAL WORKSHOP (IWSOS 2013), 2014, 8221 : 14 - 26
  • [25] A Method of SMS Spam Filtering Based on AdaBoost Algorithm
    Zhang, Xipeng
    Xiong, Gang
    Hu, Yuexiang
    Zhu, Fenghua
    Dong, Xisong
    Nyberg, Timo R.
    PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 2328 - 2332
  • [26] Matching algorithm for semantic-based publish/subscribe system
    Hu, Xi-Xiang
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2009, 43 (01): : 63 - 68
  • [27] An efficient SVM-based SPAM filtering algorithm
    Wang, Zi-Qiang
    Sun, Xia
    Li, Xin
    Zhang, De-Xian
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 3682 - +
  • [28] Spam filtering based on improved dendritic cell algorithm
    Gong, T.
    Li, N.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 34 - 34
  • [29] Semantic-based web service matchmaking algorithm in biomedicine
    Li, Wenjie
    Guo, Wenjing
    BMEI 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOL 1, 2008, : 648 - 652
  • [30] Spam Mails Filtering Using Different Classifiers with Feature Selection and Reduction Techniques
    Sharma, Amit Kumar
    Yadav, Renuka
    2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 1089 - 1093