Intelligent SMS Spam Filtering Using Topic Model

被引:14
|
作者
Ma, Jialin [1 ,2 ]
Zhang, Yongjun [1 ,2 ]
Liu, Jinling [1 ]
Yu, Kun [1 ]
Wang, XuAn [3 ]
机构
[1] Huaiyin Inst Technol, Huaian, Peoples R China
[2] Hohai Univ, Nanjing, Jiangsu, Peoples R China
[3] CAPF, Engn Univ, Xian, Peoples R China
关键词
SMS Spam; Topic Model; LDA; MTM;
D O I
10.1109/INCoS.2016.47
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, spam messages have been overflowing in many countries. They seriously violate personal rights, and may even harm the national security. The existing filtering techniques usually uses traditional text classifiers, which are more suitable to deal with normal long texts; therefore, it often faces some serious challenges, such as the sparse data problem and noise data in the SMS message. This research work proposes a message topic model (MTM) for SMS spam filtering. The MTM derives from the famous probability topic model. Although the MTM is based on probability topic model, it is different from the famous standard Latent Dirichlet Allocation (LDA) in the following aspects: (1) For the purpose of overcoming the sparsity problem in SMS message classification, first, the standard K-means algorithm is used to classify the training data into rough classes, then, aggregates all the spam messages of a class into a single document. (2) Symbol semantics is taken in account. Some preprocessing rules and background terms are considered to make the model more appropriate to fully represent SMS spam. Finally, we compare the MTM with the SVM and the standard LDA on the public SMS spam corpus. The experimental results show that the MTM is more effective for the task of SMS spam filtering.
引用
收藏
页码:380 / 383
页数:4
相关论文
共 50 条
  • [31] Spam filtering using spam mail communities
    Deepak, P
    Parameswaran, S
    [J]. 2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 377 - 383
  • [32] Index-based Online Text Classification for SMS Spam Filtering
    Liu, Wuying
    Wang, Ting
    [J]. JOURNAL OF COMPUTERS, 2010, 5 (06) : 844 - 851
  • [33] Comparison of automated machine learning tools for SMS spam message filtering
    Center for Artificial Intelligence Research , University of Agder, Jon Lilletuns vei 9, Grimstad
    4879, Norway
    [J]. arXiv, 1600,
  • [34] An Anti-SMS-Spam Using CAPTCHA
    Shirali-Shahreza, M. Hassan
    Shirali-Shahreza, Mohammad
    [J]. 2008 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL 2, PROCEEDINGS, 2008, : 318 - +
  • [35] A Topic-Based Hidden Markov Model for Real-Time Spam Tweets Filtering
    Washha, Mahdi
    Qaroush, Aziz
    Mezghani, Manel
    Sedes, Florence
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 833 - 843
  • [36] Enhanced Topic-based Vector Space Model for semantics-aware spam filtering
    Santos, Igor
    Laorden, Carlos
    Sanz, Borja
    Bringas, Pablo G.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 437 - 444
  • [37] SMS Spam Detection Using Noncontent Features
    Xu, Qian
    Xiang, Evan Wei
    Yang, Qiang
    Du, Jiachun
    Zhong, Jieping
    [J]. IEEE INTELLIGENT SYSTEMS, 2012, 27 (06) : 44 - 51
  • [38] An Adaptive and Collaborative Server-Side SMS Spam Filtering Scheme Using Artificial Immune System
    Onashoga, Adebukola S.
    Abayomi-Alli, Olusola O.
    Sodiya, Adesina S.
    Ojo, David A.
    [J]. INFORMATION SECURITY JOURNAL, 2015, 24 (4-6): : 133 - 145
  • [39] SMS spam filtering and thread identification using bi-level text classification and clustering techniques
    Nagwani, Naresh Kumar
    Sharaff, Aakanksha
    [J]. JOURNAL OF INFORMATION SCIENCE, 2017, 43 (01) : 75 - 87
  • [40] Intelligent spam filtering for massive short message stream
    Xia, Hu
    Fu, Yan
    Zhou, Junlin
    Xia, Qi
    [J]. COMPEL-THE INTERNATIONAL JOURNAL FOR COMPUTATION AND MATHEMATICS IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2013, 32 (02) : 586 - 596