Term Space Partition Based Ensemble Feature Construction for Spam Detection

被引:3
|
作者
Mi, Guyue [1 ,2 ]
Gao, Yang [1 ,2 ]
Tan, Ying [1 ,2 ]
机构
[1] Peking Univ, Key Lab Machine Percept MOE, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Dept Machine Intelligence, Beijing 100871, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Term space partition (TSP); Ensemble term space partition (ETSP); Feature construction; Spam detection; Text categorization;
D O I
10.1007/978-3-319-40973-3_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an ensemble feature construction method for spam detection by using the term space partition (TSP) approach, which aims to establish a mechanism to make terms play more sufficient and rational roles by dividing the original term space and constructing discriminative features on distinct subspaces. The ensemble features are constructed by taking both global and local features of emails into account in feature perspective, where variable-length sliding window technique is adopted. Experiments conducted on five benchmark corpora suggest that the ensemble feature construction method far outperforms not only the traditional and most widely used bag-of-words model, but also the heuristic and state-of-the-art immune concentration based feature construction approaches. Compared to the original TSP approach, the ensemble method achieves better performance and robustness, providing an alternative mechanism of reliability for different application scenarios.
引用
收藏
页码:205 / 216
页数:12
相关论文
共 50 条
  • [21] Ensemble Learning And its Application in Spam Detection
    Ghosh, Arka
    Das, Raja
    Dey, Shreyashi
    Mahapatra, Gautam
    [J]. 2023 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL & COMMUNICATION ENGINEERING, ICCECE, 2023,
  • [22] A Multi-Resolution-Concentration Based Feature Construction Approach for Spam Filtering
    Mi, Guyue
    Zhang, Pengtao
    Tan, Ying
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [23] Spam filtering based on classifiers ensemble
    Yang, Zhen
    Fan, Ke-Feng
    Lei, Jian-Jun
    Lai, Ying-Xu
    [J]. Tongxin Xuebao/Journal on Communication, 2008, 29 (SUPPL.): : 7 - 11
  • [24] EGMA: Ensemble Learning-Based Hybrid Model Approach for Spam Detection
    Bilgen, Yusuf
    Kaya, Mahmut
    [J]. Applied Sciences (Switzerland), 2024, 14 (21):
  • [25] Ensemble based spam detection in social loT using probabilistic data structures
    Singh, Amritpal
    Batra, Shalini
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 81 : 359 - 371
  • [26] Ensemble method of feature gene selection based on recursive partition-tree
    Li, Xia
    Zhang, Tian-Wen
    Guo, Zheng
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2004, 27 (05): : 675 - 682
  • [27] Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks
    Barushka, Aliaksandr
    Hajek, Petr
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4239 - 4257
  • [28] Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks
    Aliaksandr Barushka
    Petr Hajek
    [J]. Neural Computing and Applications, 2020, 32 : 4239 - 4257
  • [29] Opinion Spam Detection Using Feature Selection
    Patel, Rinki
    Thakkar, Priyank
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 560 - 564
  • [30] Dynamic Feature Selection for Spam Detection in Twitter
    Karakasli, M. Salih
    Aydin, Muhammed Ali
    Yarkan, Serhan
    Boyaci, Ali
    [J]. INTERNATIONAL TELECOMMUNICATIONS CONFERENCE, ITELCON 2017, 2019, 504 : 239 - 250