Support vector machines for spam categorization

被引:772
|
作者
Drucker, H [1 ]
Wu, DH
Vapnik, VN
机构
[1] AT&T Bell Labs, Res, Red Bank, NJ 07701 USA
[2] Monmouth Univ, Dept Elect Engn, W Long Branch, NJ 07764 USA
[3] Rensselaer Polytech Inst, Troy, NY 12181 USA
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1999年 / 10卷 / 05期
关键词
boosting algorithms; classification; e-mail; feature representation; Ripper; Rocchio; support vector machines;
D O I
10.1109/72.788645
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the use of support vector machines (SVM's) In classifying e-mail as spam or nonspam by comparing it to three other classification algorithms: Ripper, Rocchio, and boosting decision trees, These four algorithms were tested on two different data sets: one data set where the number of features were constrained to the 1000 best features and another data set where the dimensionality was over 7000, SVM's performed best when using binary features. For both data sets, boosting trees and SVM's had acceptable test performance in terms of accuracy and speed. However, SVM's had significantly less training time.
引用
收藏
页码:1048 / 1054
页数:7
相关论文
共 50 条
  • [31] SVM categorizer: A generic categorization tool using support vector machines
    Kapoutsis, E
    Theodoulidis, B
    Saraee, M
    [J]. IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 1109 - 1112
  • [32] Fast text categorization with min-max modular support vector machines
    Liu, FY
    Wu, K
    Zhao, H
    Lu, BL
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 570 - 575
  • [33] An anti-noise text categorization method based on support vector machines
    Chen, L
    Huang, J
    Gong, ZH
    [J]. ADVANCES IN WEB INTELLIGENCE, PROCEEDINGS, 2005, 3528 : 272 - 278
  • [34] CATEGORIZATION OF SIMILAR OBJECTS USING BAG OF VISUAL WORDS AND SUPPORT VECTOR MACHINES
    Gorecki, Przemyslaw
    Artiemjew, Piotr
    Drozda, Pawel
    Sopyla, Krzysztof
    [J]. ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 231 - 236
  • [35] A fuzzy semi-supervised support vector machines approach to hypertext categorization
    Benbrahim, Houda
    Bramer, Max
    [J]. ARTIFICIAL INTELLIGENCE IN THEORY AND PRACTICE II, 2008, 276 : 97 - 106
  • [36] Informative Vector Machines for text categorization
    Stankovic, Milos
    Stankovic, Srdan
    [J]. NEUREL 2006: EIGHT SEMINAR ON NEURAL NETWORK APPLICATIONS IN ELECTRICAL ENGINEERING, PROCEEDINGS, 2006, : 99 - +
  • [37] Two steps features selection and support vector machines for web page text categorization
    Li, Xinfu
    Tian, Xuedong
    [J]. Journal of Computational Information Systems, 2008, 4 (01): : 133 - 138
  • [38] Text categorization with support vector machines.: How to represent texts in input space?
    Leopold, E
    Kindermann, J
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 423 - 444
  • [39] Least Squares Support Vector Machines for FHR Classification and Assessing the pH Based Categorization
    Stylios, C. D.
    Georgoulas, G.
    Karvelis, P.
    Spilka, J.
    Chudacek, V.
    Lhotska, L.
    [J]. XIV MEDITERRANEAN CONFERENCE ON MEDICAL AND BIOLOGICAL ENGINEERING AND COMPUTING 2016, 2016, 57 : 1211 - 1215
  • [40] Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?
    Edda Leopold
    Jörg Kindermann
    [J]. Machine Learning, 2002, 46 : 423 - 444