SMS spam filtering and thread identification using bi-level text classification and clustering techniques

被引:29
|
作者
Nagwani, Naresh Kumar [1 ]
Sharaff, Aakanksha [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci & Engn, Raipur 492010, Madhya Pradesh, India
关键词
K-means clustering; non-negative matrix factorization; SMS thread; SMS clustering; thread identification; thread identification in SMS;
D O I
10.1177/0165551515616310
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
SMS spam detection is an important task where spam SMS messages are identified and filtered. As greater numbers of SMS messages are communicated every day, it is very difficult for a user to remember and correlate the newer SMS messages received in context to previously received SMS. SMS threads provide a solution to this problem. In this work the problem of SMS spam detection and thread identification is discussed and a state of the art clustering-based algorithm is presented. The work is planned in two stages. In the first stage the binary classification technique is applied to categorize SMS messages into two categories namely, spam and non-spam SMS; then, in the second stage, SMS clusters are created for non-spam SMS messages using non-negative matrix factorization and K-means clustering techniques. A threading-based similarity feature, that is, time between consecutive communications, is described for the identification of SMS threads, and the impact of the time threshold in thread identification is also analysed experimentally. Performance parameters like accuracy, precision, recall and F-measure are also evaluated. The SMS threads identified in this proposed work can be used in applications like SMS thread summarization, SMS folder classification and other SMS management-related tasks.
引用
收藏
页码:75 / 87
页数:13
相关论文
共 27 条
  • [1] A Bi-Level Text Classification Approach for SMS Spam Filtering and Identifying Priority Messages
    Nagwani, Naresh Kumar
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (04) : 473 - 480
  • [2] SMS Spam Filtering based on Text Classification and Expert System
    Bozan, Yavuz Selim
    Coban, Onder
    Ozyer, Gulsah Tumuklu
    Ozyer, Baris
    [J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 2345 - 2348
  • [3] Index-based Online Text Classification for SMS Spam Filtering
    Liu, Wuying
    Wang, Ting
    [J]. JOURNAL OF COMPUTERS, 2010, 5 (06) : 844 - 851
  • [4] Spam SMS filtering based on text features and supervised machine learning techniques
    Muhammad Adeel Abid
    Saleem Ullah
    Muhammad Abubakar Siddique
    Muhammad Faheem Mushtaq
    Wajdi Aljedaani
    Furqan Rustam
    [J]. Multimedia Tools and Applications, 2022, 81 : 39853 - 39871
  • [5] Spam SMS filtering based on text features and supervised machine learning techniques
    Abid, Muhammad Adeel
    Ullah, Saleem
    Siddique, Muhammad Abubakar
    Mushtaq, Muhammad Faheem
    Aljedaani, Wajdi
    Rustam, Furqan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 39853 - 39871
  • [6] Hybrid SMS Spam Filtering System Using Machine Learning Techniques
    Baaqeel, Hind
    Zagrouba, Rachid
    [J]. 2020 21ST INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2020,
  • [7] Text Classification using Clustering Techniques and PCA
    Kaur, Manpreet
    Bansal, Meenakshi
    [J]. 2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 642 - 646
  • [8] SMS Spam Filtering on Multiple Background Datasets Using Machine Learning Techniques: A Novel Approach
    Kaliyar, Rohit Kumar
    Narang, Pratik
    Goswami, Anurag
    [J]. PROCEEDINGS OF THE 2018 IEEE 8TH INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC 2018), 2018, : 59 - 65
  • [9] Top-N Recommendation using Bi-Level Collaborative Filtering
    Banerjee, Suman
    Banjare, Pratik
    Jenamani, Mamata
    Pratihar, Dilip Kumar
    [J]. 2017 NINTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2017, : 235 - 241
  • [10] A bi-level inventory replenishment strategy using clustering genetic algorithm
    Lee, Ming-Chang
    Wee, Hui-Ming
    Wu, Simon
    Wang, C. Edward
    Chung, Rih-Lung
    [J]. EUROPEAN JOURNAL OF INDUSTRIAL ENGINEERING, 2015, 9 (06) : 774 - 793