A Constant Time Complexity Spam Detection Algorithm for Boosting Throughput on Rule-Based Filtering Systems

被引:14
|
作者
Xia, Tian [1 ]
机构
[1] Shanghai Polytech Univ, Comp & Informat Engn Dept, Shanghai 201209, Peoples R China
关键词
Constant time complexity; hash forest; rule-based filtering; spam detection; throughput; SUPPORT VECTOR MACHINES;
D O I
10.1109/ACCESS.2020.2991328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Along with the barbarous growth of spams, anti-spam technologies including rule-based approaches and machine-learning thrive rapidly as well. In antispam industry, the rule-based systems (RBS) becomes the most prominent methods for fighting spam due to its capability to enrich and update rules remotely. However, the antispam filtering throughput is always a great challenge of RBS. Especially, the explosively spreading of obfuscated words leads to frequent rule update and extensive rule vocabulary expansion. These incremental obfuscated words make the filtering speed slow down and the throughput decrease. This paper addresses the challenging throughput issue and proposes a constant time complexity rule-based spam detection algorithm. The algorithm has a constant processing speed, which is independent of rule and its vocabulary size. A new special data structure, namely, Hash Forest, and a rule encoding method are developed to make constant time complexity possible. Instead of traversing each spam term in rules, the proposed algorithm manages to detect spam terms by checking a very small portion of all terms. The experiment results show effectiveness of proposed algorithm.
引用
收藏
页码:82653 / 82661
页数:9
相关论文
共 50 条
  • [21] A Rule-Based Algorithm for the Detection of Arud Meter in Classical Arabic Poetry
    Abuata, Belal
    Al-Omari, Asma
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (04) : 661 - 667
  • [22] Semantic Rule-Based Sentiment Detection Algorithm for Russian Publicism Sentences
    A. Y. Poletaev
    I. V. Paramonov
    E. I. Boychuk
    Automatic Control and Computer Sciences, 2024, 58 (7) : 977 - 994
  • [23] A Rule-Based Algorithm and Its Specializations for Measuring the Complexity of Software in Educational Digital Environments
    Gorchakov, Artyom V.
    Demidova, Liliya A.
    Sovietov, Peter N.
    COMPUTERS, 2024, 13 (03)
  • [24] A neuro-fuzzy MAR algorithm for temporal rule-based systems
    Sisman, NA
    Alpaslan, FN
    Akman, V
    WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 8, PROCEEDINGS: CONCEPTS AND APPLICATIONS OF SYSTEMICS, CYBERNETICS AND INFORMATICS, 1999, : 87 - 92
  • [25] Thermal detection of buried circular objects with a rule-based fast shape detection algorithm
    Azak, MD
    Akgün, S
    Azak, SI
    Torun, E
    PROCEEDINGS OF THE IEEE SENSORS 2003, VOLS 1 AND 2, 2003, : 765 - 768
  • [26] A multi-objective evolutionary algorithm for rule selection and tuning on fuzzy rule-based systems
    Alcala, Rafael
    Alcala-Fdez, Jesus
    Gacto, Maria Jose
    Herrera, Francisco
    2007 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-4, 2007, : 1372 - 1377
  • [27] Hierarchical rule-based fault detection and diagnostic method for HVAC systems
    Schein, J
    Bushby, ST
    HVAC&R RESEARCH, 2006, 12 (01): : 111 - 125
  • [28] Analyzing Attack Strategies Against Rule-Based Intrusion Detection Systems
    Parameshwarappa, Pooja
    Chen, Zhiyuan
    Gangopadhyay, Aryya
    PROCEEDINGS OF THE WORKSHOP PROGRAM OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING (ICDCN'18), 2018,
  • [29] Bounded-time fault-tolerant rule-based systems
    Browne, James C.
    Emerson, Allen
    Gouda, Mohamed
    Miranker, Daniel
    Mok, Aloysius
    Rosier, Louis
    Telematics and Informatics, 1990, 7 (3-4)
  • [30] FORMAL ANALYSIS OF REAL-TIME EQUATIONAL RULE-BASED SYSTEMS
    MOK, AK
    REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 1989, : 308 - 318