Intelligent SMS Spam Filtering Using Topic Model

被引:14
|
作者
Ma, Jialin [1 ,2 ]
Zhang, Yongjun [1 ,2 ]
Liu, Jinling [1 ]
Yu, Kun [1 ]
Wang, XuAn [3 ]
机构
[1] Huaiyin Inst Technol, Huaian, Peoples R China
[2] Hohai Univ, Nanjing, Jiangsu, Peoples R China
[3] CAPF, Engn Univ, Xian, Peoples R China
关键词
SMS Spam; Topic Model; LDA; MTM;
D O I
10.1109/INCoS.2016.47
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, spam messages have been overflowing in many countries. They seriously violate personal rights, and may even harm the national security. The existing filtering techniques usually uses traditional text classifiers, which are more suitable to deal with normal long texts; therefore, it often faces some serious challenges, such as the sparse data problem and noise data in the SMS message. This research work proposes a message topic model (MTM) for SMS spam filtering. The MTM derives from the famous probability topic model. Although the MTM is based on probability topic model, it is different from the famous standard Latent Dirichlet Allocation (LDA) in the following aspects: (1) For the purpose of overcoming the sparsity problem in SMS message classification, first, the standard K-means algorithm is used to classify the training data into rough classes, then, aggregates all the spam messages of a class into a single document. (2) Symbol semantics is taken in account. Some preprocessing rules and background terms are considered to make the model more appropriate to fully represent SMS spam. Finally, we compare the MTM with the SVM and the standard LDA on the public SMS spam corpus. The experimental results show that the MTM is more effective for the task of SMS spam filtering.
引用
收藏
页码:380 / 383
页数:4
相关论文
共 50 条
  • [41] Spam SMS filtering based on text features and supervised machine learning techniques
    Muhammad Adeel Abid
    Saleem Ullah
    Muhammad Abubakar Siddique
    Muhammad Faheem Mushtaq
    Wajdi Aljedaani
    Furqan Rustam
    [J]. Multimedia Tools and Applications, 2022, 81 : 39853 - 39871
  • [42] Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering
    Almeida, Tiago A.
    Silva, Tiago P.
    Santos, Igor
    Gomez Hidalgo, Jose M.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 108 : 25 - 32
  • [43] Contributions to the study of bi-lingual Roman Urdu SMS Spam filtering
    Mehmood, Kashif
    Afzal, Hammad
    Majeed, Awais
    Latif, Hassan
    [J]. 2015 NATIONAL SOFTWARE ENGINEERING CONFERENCE (NSEC), 2015, : 42 - 47
  • [44] Spam SMS filtering based on text features and supervised machine learning techniques
    Abid, Muhammad Adeel
    Ullah, Saleem
    Siddique, Muhammad Abubakar
    Mushtaq, Muhammad Faheem
    Aljedaani, Wajdi
    Rustam, Furqan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 39853 - 39871
  • [45] A Vector Space Model based spam SMS filter
    Li, Wei
    Zeng, Sisheng
    [J]. 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE), 2016, : 553 - 557
  • [46] A Discrete Hidden Markov Model for SMS Spam Detection
    Xia, Tian
    Chen, Xuemin
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (14):
  • [47] Efficient spam filtering through intelligent text modification detection using machine learning
    Mageshkumar, N.
    Vijayaraj, A.
    Arunpriya, N.
    Sangeetha, A.
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 848 - 858
  • [48] Efficient spam filtering through intelligent text modification detection using machine learning
    Mageshkumar, N.
    Vijayaraj, A.
    Arunpriya, N.
    Sangeetha, A.
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 848 - 858
  • [49] Intelligent Security Schema for SMS Spam Message Based on Machine Learning Algorithms
    Alshahrani, Ali
    [J]. International Journal of Interactive Mobile Technologies, 2021, 15 (16) : 52 - 62
  • [50] Computing a Comprehensible Model for Spam Filtering
    Ruiz-Sepulveda, Amparo
    Trivino-Rodriguez, Jose L.
    Morales-Bueno, Rafael
    [J]. DISCOVERY SCIENCE, PROCEEDINGS, 2009, 5808 : 457 - 464