Term Space Partition Based Ensemble Feature Construction for Spam Detection

被引:3
|
作者
Mi, Guyue [1 ,2 ]
Gao, Yang [1 ,2 ]
Tan, Ying [1 ,2 ]
机构
[1] Peking Univ, Key Lab Machine Percept MOE, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Dept Machine Intelligence, Beijing 100871, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Term space partition (TSP); Ensemble term space partition (ETSP); Feature construction; Spam detection; Text categorization;
D O I
10.1007/978-3-319-40973-3_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an ensemble feature construction method for spam detection by using the term space partition (TSP) approach, which aims to establish a mechanism to make terms play more sufficient and rational roles by dividing the original term space and constructing discriminative features on distinct subspaces. The ensemble features are constructed by taking both global and local features of emails into account in feature perspective, where variable-length sliding window technique is adopted. Experiments conducted on five benchmark corpora suggest that the ensemble feature construction method far outperforms not only the traditional and most widely used bag-of-words model, but also the heuristic and state-of-the-art immune concentration based feature construction approaches. Compared to the original TSP approach, the ensemble method achieves better performance and robustness, providing an alternative mechanism of reliability for different application scenarios.
引用
收藏
页码:205 / 216
页数:12
相关论文
共 50 条
  • [31] AUTOMATIC BUILDING DETECTION WITH FEATURE SPACE FUSION USING ENSEMBLE LEARNING
    Senaras, Caglar
    Yuksel, Baris
    Ozay, Mete
    Yarman-Vural, Fatos
    [J]. 2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 6713 - 6716
  • [32] An ensemble approach for spam detection in Arabic opinion texts
    Saeed, Radwa M. K.
    Rady, Sherine
    Gharib, Tarek F.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (01) : 1407 - 1416
  • [33] Adaptive spam filtering using dynamic feature space
    Zhou, Y
    Mulekar, MS
    Nerellapalli, P
    [J]. ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 302 - 309
  • [34] Hyperparameter Optimization of Ensemble Models for Spam Email Detection
    Omotehinwa, Temidayo Oluwatosin
    Oyewola, David Opeoluwa
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [35] Spam Review Classification Using Ensemble of Global and Local Feature Selectors
    Ansari, Gunjan
    Ahmad, Tanvir
    Doja, Mohammad Najmud
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (04) : 29 - 42
  • [36] Hybrid ensemble and soft computing approaches for review spam detection on different spam datasets
    Amin, Irtiqa
    Dubey, Mithilesh Kumar
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 62 : 4779 - 4787
  • [37] Improving Email Spam Detection Using Content Based Feature Engineering Approach
    Hijawi, Wadi'
    Faris, Hossam
    Alqatawna, Ja'far
    Al-Zoubi, Ala' M.
    Aljarah, Ibrahim
    [J]. 2017 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2017,
  • [38] A Genetic Programming Approach to Feature Construction for Ensemble Learning in Skin Cancer Detection
    Ul Ain, Qurrat
    Al-Sahaf, Harith
    Xue, Bing
    Zhang, Mengjie
    [J]. GECCO'20: PROCEEDINGS OF THE 2020 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2020, : 1186 - 1194
  • [39] A partition of feature space based on information energy in classification with fuzzy observations
    Burduk R.
    [J]. Advances in Intelligent and Soft Computing, 2010, 84 : 159 - 166
  • [40] A consensus pattern of content feature and link feature for web spam detection
    Gao, Shuang
    Zhang, Huaxiang
    Liu, Li
    Fang, Xiaonan
    [J]. Zhang, H. (824223485@163.com), 1600, Binary Information Press (10): : 3759 - 3766