A Constant Time Complexity Spam Detection Algorithm for Boosting Throughput on Rule-Based Filtering Systems

被引:14
|
作者
Xia, Tian [1 ]
机构
[1] Shanghai Polytech Univ, Comp & Informat Engn Dept, Shanghai 201209, Peoples R China
关键词
Constant time complexity; hash forest; rule-based filtering; spam detection; throughput; SUPPORT VECTOR MACHINES;
D O I
10.1109/ACCESS.2020.2991328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Along with the barbarous growth of spams, anti-spam technologies including rule-based approaches and machine-learning thrive rapidly as well. In antispam industry, the rule-based systems (RBS) becomes the most prominent methods for fighting spam due to its capability to enrich and update rules remotely. However, the antispam filtering throughput is always a great challenge of RBS. Especially, the explosively spreading of obfuscated words leads to frequent rule update and extensive rule vocabulary expansion. These incremental obfuscated words make the filtering speed slow down and the throughput decrease. This paper addresses the challenging throughput issue and proposes a constant time complexity rule-based spam detection algorithm. The algorithm has a constant processing speed, which is independent of rule and its vocabulary size. A new special data structure, namely, Hash Forest, and a rule encoding method are developed to make constant time complexity possible. Instead of traversing each spam term in rules, the proposed algorithm manages to detect spam terms by checking a very small portion of all terms. The experiment results show effectiveness of proposed algorithm.
引用
收藏
页码:82653 / 82661
页数:9
相关论文
共 50 条
  • [1] Effective scheduling strategies for boosting performance on rule-based spam filtering frameworks
    Ruano-Ordas, D.
    Fdez-Glez, J.
    Fdez-Riverola, F.
    Mendez, J. R.
    JOURNAL OF SYSTEMS AND SOFTWARE, 2013, 86 (12) : 3151 - 3161
  • [2] Using new scheduling heuristics based on resource consumption information for increasing throughput on rule-based spam filtering systems
    Ruano-Ordas, David
    Fdez-Glez, Jorge
    Fdez-Riverola, Florentino
    Ramon Mendez, Jose
    SOFTWARE-PRACTICE & EXPERIENCE, 2016, 46 (08): : 1035 - 1051
  • [3] RuleSIM: a toolkit for simulating the operation and improving throughput of rule-based spam filters
    Department of Computer Science, University of Vigo, ESEI, Campus As Lagoas, Ourense
    32004, Spain
    不详
    2411-901, Portugal
    Software Pract Exper, 8 (1091-1108):
  • [4] RuleSIM: a toolkit for simulating the operation and improving throughput of rule-based spam filters
    Ruano-Ordas, David
    Fdez-Glez, Jorge
    Fdez-Riverola, Florentino
    Fernandes, Vitor Basto
    Ramon Mendez, Jose
    SOFTWARE-PRACTICE & EXPERIENCE, 2016, 46 (08): : 1091 - 1108
  • [5] A Rule Status Monitoring Algorithm for Rule-Based Intrusion Detection and Prevention Systems
    Turner, Claude
    Jeremiah, Rolston
    Richards, Dwight
    Joseph, Anthony
    COMPLEX ADAPTIVE SYSTEMS, 2016, 95 : 361 - 368
  • [6] ON THE STOCHASTIC COMPLEXITY OF LOOPS IN RULE-BASED EXPERT SYSTEMS
    SZABO, ME
    INFORMATION SCIENCES, 1992, 64 (03) : 233 - 249
  • [7] MEASURING THE COMPLEXITY OF RULE-BASED EXPERT-SYSTEMS
    CHEN, ZS
    SUEN, CY
    EXPERT SYSTEMS WITH APPLICATIONS, 1994, 7 (04) : 467 - 481
  • [8] Fault detection in Rule-based Software systems
    Wang, D
    Hao, RB
    Lee, D
    INFORMATION AND SOFTWARE TECHNOLOGY, 2003, 45 (12) : 865 - 871
  • [9] Rule-based Sleep-Apnea detection algorithm
    Pugliese, Luigi
    Guagnano, Michele
    Groppo, Sara
    Violante, Massimo
    Groppo, Riccardo
    2023 9TH INTERNATIONAL WORKSHOP ON ADVANCES IN SENSORS AND INTERFACES, IWASI, 2023, : 251 - 255
  • [10] Stopping Rule-Based Iterative Tree Search for Low-Complexity Detection in MIMO Systems
    Sah, Abhay Kumar
    Chaturvedi, A. K.
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2017, 16 (01) : 169 - 179