Term Space Partition Based Ensemble Feature Construction for Spam Detection

被引:3
|
作者
Mi, Guyue [1 ,2 ]
Gao, Yang [1 ,2 ]
Tan, Ying [1 ,2 ]
机构
[1] Peking Univ, Key Lab Machine Percept MOE, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Dept Machine Intelligence, Beijing 100871, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Term space partition (TSP); Ensemble term space partition (ETSP); Feature construction; Spam detection; Text categorization;
D O I
10.1007/978-3-319-40973-3_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an ensemble feature construction method for spam detection by using the term space partition (TSP) approach, which aims to establish a mechanism to make terms play more sufficient and rational roles by dividing the original term space and constructing discriminative features on distinct subspaces. The ensemble features are constructed by taking both global and local features of emails into account in feature perspective, where variable-length sliding window technique is adopted. Experiments conducted on five benchmark corpora suggest that the ensemble feature construction method far outperforms not only the traditional and most widely used bag-of-words model, but also the heuristic and state-of-the-art immune concentration based feature construction approaches. Compared to the original TSP approach, the ensemble method achieves better performance and robustness, providing an alternative mechanism of reliability for different application scenarios.
引用
收藏
页码:205 / 216
页数:12
相关论文
共 50 条
  • [1] Ensemble Decision for Spam Detection Using Term Space Partition Approach
    Tan, Ying
    Wang, Quanbin
    Mi, Guyue
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (01) : 297 - 309
  • [2] Feature Construction Approach for Email Categorization Based on Term Space Partition
    Mi, Guyue
    Zhang, Pengtao
    Tan, Ying
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [3] Concentration Based Feature Construction Approach for Spam Detection
    Tan, Ying
    Deng, Chao
    Ruan, Guangchen
    [J]. IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 510 - 515
  • [4] Variable Length Concentration based Feature Construction Method for Spam Detection
    Gao, Yang
    Mi, Guyue
    Tan, Ying
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [5] Consensus based Ensemble model for Spam detection
    Pantola, Paritosh
    Bala, Anju
    Rana, Prashant Singh
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1724 - 1727
  • [6] A Deceptive Reviews Detection Method Based on Multidimensional Feature Construction and Ensemble Feature Selection
    Li, Shudong
    Zhong, Guojin
    Jin, Yanlin
    Wu, Xiaobo
    Zhu, Peican
    Wang, Zhen
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (01) : 153 - 165
  • [7] Ensemble-Based Text Classification for Spam Detection
    Zhang X.
    Liu G.
    Zhang M.
    [J]. Informatica (Slovenia), 2024, 48 (06): : 71 - 80
  • [8] Genetic-based Feature Selection for Spam Detection
    Arani, Seyyed Hossein Seyyedi
    Mozaffari, Saeed
    [J]. 2013 21ST IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2013,
  • [9] Multi-view Ensemble Learning Using Rough Set Based Feature Ranking for Opinion Spam Detection
    Saini, Mayank
    Verma, Sharad
    Sharan, Aditi
    [J]. ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, VOL 1, 2019, 759 : 3 - 12
  • [10] Spoken Term Detection Based on Feature Space Trajectory Information
    Tian Y.-H.
    He Q.-H.
    Zheng R.-W.
    Wei Z.
    Li Y.-X.
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (10): : 2915 - 2924