A Constant Time Complexity Spam Detection Algorithm for Boosting Throughput on Rule-Based Filtering Systems

被引:14
|
作者
Xia, Tian [1 ]
机构
[1] Shanghai Polytech Univ, Comp & Informat Engn Dept, Shanghai 201209, Peoples R China
关键词
Constant time complexity; hash forest; rule-based filtering; spam detection; throughput; SUPPORT VECTOR MACHINES;
D O I
10.1109/ACCESS.2020.2991328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Along with the barbarous growth of spams, anti-spam technologies including rule-based approaches and machine-learning thrive rapidly as well. In antispam industry, the rule-based systems (RBS) becomes the most prominent methods for fighting spam due to its capability to enrich and update rules remotely. However, the antispam filtering throughput is always a great challenge of RBS. Especially, the explosively spreading of obfuscated words leads to frequent rule update and extensive rule vocabulary expansion. These incremental obfuscated words make the filtering speed slow down and the throughput decrease. This paper addresses the challenging throughput issue and proposes a constant time complexity rule-based spam detection algorithm. The algorithm has a constant processing speed, which is independent of rule and its vocabulary size. A new special data structure, namely, Hash Forest, and a rule encoding method are developed to make constant time complexity possible. Instead of traversing each spam term in rules, the proposed algorithm manages to detect spam terms by checking a very small portion of all terms. The experiment results show effectiveness of proposed algorithm.
引用
收藏
页码:82653 / 82661
页数:9
相关论文
共 50 条
  • [31] Bounding the computation time of forward-chaining rule-based systems
    Tomsovic, Kevin
    Liu, Chen-Ching
    Data and Knowledge Engineering, 1993, 10 (03): : 317 - 334
  • [32] A rule-based fault detection algorithm for a purge system of a ventricular assist device
    Yu, Yih-Choung
    Journal of Engineering, Computing and Architecture, 2008, 2 (01): : 1 - 11
  • [33] Improved algorithm on rule-based reasoning systems modeled by Fuzzy Petri Nets
    Yang, R
    Leung, WS
    Heng, PA
    Leung, KS
    PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 1204 - 1209
  • [34] Pulmonary Nodule Detection in CT Images Using Optimal Multilevel Thresholds and Rule-based Filtering
    Sahu, Satya Prakash
    Londhe, Narendra D.
    Verma, Shrish
    IETE JOURNAL OF RESEARCH, 2022, 68 (01) : 265 - 282
  • [35] RESPONSE-TIME ANALYSIS OF EQL REAL-TIME RULE-BASED SYSTEMS
    CHEN, JR
    CHENG, AMK
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1995, 7 (01) : 26 - 43
  • [36] Automatic Laser Pointer Detection Algorithm for Environment Control Device Systems Based on Template Matching and Genetic Tuning of Fuzzy Rule-Based Systems
    Chavez, F.
    Fernandez, F.
    Gacto, M. J.
    Alcala, R.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2012, 5 (02) : 368 - 386
  • [37] Automatic Laser Pointer Detection Algorithm for Environment Control Device Systems Based on Template Matching and Genetic Tuning of Fuzzy Rule-Based Systems
    F. Chávez
    F. Fernández
    M.J. Gacto
    R. Alcalá
    International Journal of Computational Intelligence Systems, 2012, 5 : 368 - 386
  • [38] Using rule-based activity descriptions to evaluate intrusion-detection systems
    Alessandri, D
    RECENT ADVANCES IN INTRUSION DETECTION, PROCEEDINGS, 2000, 1907 : 183 - 196
  • [39] Modeling and analysis of functionality in eHome systems: Dynamic rule-based conflict detection
    Armac, Ibrahim
    Kirchhof, Michael
    Manolescu, Liviana
    13TH ANNUAL IEEE INTERNATIONAL SYMPOSIUM AND WORKSHOP ON ENGINEERING OF COMPUTER BASED SYSTEMS, PROCEEDINGS: MASTERING THE COMPLEXITY OF COMPUTER-BASED SYSTEMS, 2006, : 219 - +
  • [40] Visual Parking Space Estimation Using Detection Networks and Rule-Based Systems
    De Luelmo, Susana P.
    Giraldo Del Viejo, Elena
    Montemayor, Antonio S.
    Jose Pantrigo, Juan
    BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II, 2022, 13259 : 583 - 592