Index-based Online Text Classification for SMS Spam Filtering

被引:27
|
作者
Liu, Wuying [1 ]
Wang, Ting [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Online Text Classification; SMS Spam Filtering; Ensemble Learning; Index Model; Spamminess Score; TREC Spam Track;
D O I
10.4304/jcp.5.6.844-851
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We proposed a novel index-based online text classification method, investigated two index models, and compared the performances of various index granularities for English and Chinese SMS message. Based on the proposed method, six individual classifiers were implemented according to various text features of Chinese message, which were further combined to form an ensemble classifier. The experimental results from English corpus show that the relevant feature among words can increase the classification confidence and the trigram co-occurrence feature of words is an appropriate relevant feature. The experimental results from real Chinese corpus show that the performance of classifier applying word-level index model is better than the one applying document-level index model. The trigram segment outperforms the exact segment in indexing, so it is not necessary to segment Chinese text exactly when indexing by our proposed method. Applying parallel multi-thread ensemble learning, our proposed method has constant time complexity, which is critical to large scale data and online filtering.
引用
收藏
页码:844 / 851
页数:8
相关论文
共 50 条
  • [1] SMS Spam Filtering based on Text Classification and Expert System
    Bozan, Yavuz Selim
    Coban, Onder
    Ozyer, Gulsah Tumuklu
    Ozyer, Baris
    [J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 2345 - 2348
  • [2] Spam Filtering by Semantics-based Text Classification
    Hu, Wei
    Du, Jinglong
    Xing, Yongkang
    [J]. 2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2016, : 89 - 94
  • [3] Spam SMS filtering based on text features and supervised machine learning techniques
    Muhammad Adeel Abid
    Saleem Ullah
    Muhammad Abubakar Siddique
    Muhammad Faheem Mushtaq
    Wajdi Aljedaani
    Furqan Rustam
    [J]. Multimedia Tools and Applications, 2022, 81 : 39853 - 39871
  • [4] Spam SMS filtering based on text features and supervised machine learning techniques
    Abid, Muhammad Adeel
    Ullah, Saleem
    Siddique, Muhammad Abubakar
    Mushtaq, Muhammad Faheem
    Aljedaani, Wajdi
    Rustam, Furqan
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 39853 - 39871
  • [5] A Bi-Level Text Classification Approach for SMS Spam Filtering and Identifying Priority Messages
    Nagwani, Naresh Kumar
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (04) : 473 - 480
  • [6] SMS Spam Filtering Based on "Cloud Security"
    Wu, Hongli
    Jiang, Yonghui
    [J]. INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 2015 - 2019
  • [7] SMS spam filtering and thread identification using bi-level text classification and clustering techniques
    Nagwani, Naresh Kumar
    Sharaff, Aakanksha
    [J]. JOURNAL OF INFORMATION SCIENCE, 2017, 43 (01) : 75 - 87
  • [8] A Method of SMS Spam Filtering Based on AdaBoost Algorithm
    Zhang, Xipeng
    Xiong, Gang
    Hu, Yuexiang
    Zhu, Fenghua
    Dong, Xisong
    Nyberg, Timo R.
    [J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 2328 - 2332
  • [9] Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering
    Almeida, Tiago A.
    Silva, Tiago P.
    Santos, Igor
    Gomez Hidalgo, Jose M.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 108 : 25 - 32
  • [10] SMS spam filtering: Methods and data
    Delany, Sarah Jane
    Buckley, Mark
    Greene, Derek
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 9899 - 9908