Feature evaluation for web crawler detection with data mining techniques

被引:56
|
作者
Stevanovic, Dusan [1 ]
An, Aijun [1 ]
Vlajic, Natalija [1 ]
机构
[1] York Univ, Dept Comp Sci & Engn, Toronto, ON M3J 1P3, Canada
关键词
Web crawler detection; Web server access logs; Data mining; Classification; DDoS; WEKA;
D O I
10.1016/j.eswa.2012.01.210
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed Denial of Service (DDoS) is one of the most damaging attacks on the Internet security today. Recently, malicious web crawlers have been used to execute automated DDoS attacks on web sites across the WWW. In this study we examine the effect of applying seven well-established data mining classification algorithms on static web server access logs in order to: (1) classify user sessions as belonging to either automated web crawlers or human visitors and (2) identify which of the automated web crawlers sessions exhibit 'malicious' behavior and are potentially participants in a DDoS attack. The classification performance is evaluated in terms of classification accuracy, recall, precision and F-1 score. Seven out of nine vector (i.e. web-session) features employed in our work are borrowed from earlier studies on classification of user sessions as belonging to web crawlers. However, we also introduce two novel web-session features: the consecutive sequential request ratio and standard deviation of page request depth. The effectiveness of the new features is evaluated in terms of the information gain and gain ratio metrics. The experimental results demonstrate the potential of the new features to improve the accuracy of data mining classifiers in identifying malicious and well-behaved web crawler sessions. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8707 / 8717
页数:11
相关论文
共 50 条
  • [1] A web crawler design for data mining
    Thelwall, M
    [J]. JOURNAL OF INFORMATION SCIENCE, 2001, 27 (05) : 319 - 325
  • [2] Mining Techniques of XSS Vulnerabilities Based on Web Crawler
    Wan Fangfang
    Xie Xusheng
    [J]. MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 6290 - 6293
  • [3] Research of Web Information Mining by using Crawler Techniques
    Li, Qing-Cheng
    Lin, Shan
    Dong, Zhen-Hua
    [J]. 2008 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, VOLS 1-4, 2008, : 1603 - 1607
  • [4] Research on Web Data Mining Based on Topic Crawler
    Guo, Hongjian
    [J]. JOURNAL OF WEB ENGINEERING, 2021, 20 (04): : 1131 - 1143
  • [5] Design and Application of Intelligent Dynamic Crawler for Web Data Mining
    Zheng Guojun
    Jia Wenchao
    Shi Jihui
    Shi Fan
    Zhu Hao
    Liu Jiang
    [J]. 2017 32ND YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2017, : 1098 - 1105
  • [6] Enhancing feature extraction for VF detection using data mining techniques
    Rosado-Muñoz, A
    Camps-Valls, G
    Guerrero-Martínez, J
    Francés-Villora, JV
    Muñoz-Marí, J
    Serrano-López, AJ
    [J]. COMPUTERS IN CARDIOLOGY 2002, VOL 29, 2002, 29 : 209 - 212
  • [7] Applying data mining techniques in intrusion detection system on web and analysis of web usage
    Al-Ahliyya Amman University, Amman, Jordan
    不详
    [J]. Inf. Technol. J., 2006, 1 (57-63):
  • [8] Detection of Malicious Requests on Web Logs Using Data Mining Techniques
    Sahin, Mehmet Emin
    Ozdemir, Suat
    [J]. 2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 463 - 468
  • [9] Web Data Mining Trends and Techniques
    Patil, Ujwala Manoj
    Patil, J. B.
    [J]. PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 961 - 965
  • [10] Applying clickstream data mining to real-time Web crawler detection and containment using ClickTips platform
    Lourenco, Analia
    Belo, Orlando
    [J]. ADVANCES IN DATA ANALYSIS, 2007, : 351 - +