A Density Based Clustering Approach to Distinguish Between Web Robot and Human Requests to a Web Server

被引:0
|
作者
Zabihi, Mahdieh [1 ]
Jahan, Majid Vafaei [2 ]
Hamidzadeh, Javad [3 ]
机构
[1] Imam Reza Int Univ, Mashhad, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran
[3] Sadjad Univ Technol, Fac Comp Engn & Informat Technol, Mashhad, Iran
关键词
Behavioral Patterns of Web Visitors; DBSCAN; Density Based Clustering; Significance of the Difference Test; Web Robots;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data sets of web visitors in a reasonable amount of time. Moreover, this technique should be insensitive to the ordering of instances and produce deterministic accurate results. Therefore, this paper presents a density-based clustering approach using Density-Based Spatial Clustering of Applications with Noises (DBSCAN), to classify web visitors of two real large data sets. We propose two new features based on the behavioral patterns of visitors to describe them. What's more, we consider 12 common features and use the significance of the difference test (T-test) to reduce the dimensions and overcome one of the disadvantages of DBSCAN. Based on the supervised evaluation metrics, the proposed algorithm has the 95% of Jaccard metric and produces two clusters having the entropy and purity rates of 0.024 and 0.97, respectively. Furthermore, from the standpoint of clustering quality and accuracy, the proposed method performs better than state-of-the-art algorithms. Finally, it can be concluded that some known web robots through imitating human users make it difficult to be identified. (C) 2014 ISC. All rights reserved.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [31] Web pages reordering and clustering based on web patterns
    Kudelka, Milos
    Snasel, Vaclav
    Lehecka, Ondrej
    El-Qawasmeh, Eyas
    Pokorny, Jaroslav
    SOFSEM 2008: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2008, 4910 : 731 - +
  • [32] Query or Spam: Detecting fraudulent web requests using stream clustering
    Shakiba, Tahere
    Zarifzadeh, Sajjad
    Derhami, Vali
    2015 2ND INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED ENGINEERING AND INNOVATION (KBEI), 2015, : 853 - 859
  • [33] An approach for estimation of software aging in a web server
    Li, L
    Vaidyanathan, K
    Trivedi, KS
    2002 INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING, PROCEEDINGS, 2002, : 91 - 100
  • [34] Web based Client/Server computing
    Ling, Yongming
    Shuili Fadian Xuebao/Journal of Hydroelectric Engineering, 1997, (01): : 3 - 7
  • [35] Analysis of application approach for Oracle Web Server based on Windows NT
    2000, China Educ Book Import Export Corp, China (23):
  • [36] Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users' future requests
    Liu, Haibin
    Keselj, Vlado
    DATA & KNOWLEDGE ENGINEERING, 2007, 61 (02) : 304 - 330
  • [37] Improving web server performance by a clustering-based dynamic load balancing algorithm
    Ho, LK
    Sit, HY
    Ho, KS
    Leong, HV
    Luk, RWP
    18TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 2 (REGULAR PAPERS), PROCEEDINGS, 2004, : 232 - 235
  • [38] A clustering approach for web vulnerabilities detection
    Dessiatnikoff, A.
    Akrout, R.
    Alata, E.
    Kaaniche, M.
    Nicomette, V.
    2011 IEEE 17TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC), 2011, : 194 - 203
  • [39] A matrix approach for hierarchical web page clustering based on hyperlinks
    Hou, JY
    Zhang, YC
    WISE 2002: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING (WORKSHOPS), 2002, : 207 - 216
  • [40] MATLAB web server and web-based control design learning
    Uran, Suzana
    Hercog, Darko
    Jezernik, Karel
    IECON 2006 - 32ND ANNUAL CONFERENCE ON IEEE INDUSTRIAL ELECTRONICS, VOLS 1-11, 2006, : 5347 - +