A Density Based Clustering Approach to Distinguish Between Web Robot and Human Requests to a Web Server

被引:0
|
作者
Zabihi, Mahdieh [1 ]
Jahan, Majid Vafaei [2 ]
Hamidzadeh, Javad [3 ]
机构
[1] Imam Reza Int Univ, Mashhad, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran
[3] Sadjad Univ Technol, Fac Comp Engn & Informat Technol, Mashhad, Iran
关键词
Behavioral Patterns of Web Visitors; DBSCAN; Density Based Clustering; Significance of the Difference Test; Web Robots;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data sets of web visitors in a reasonable amount of time. Moreover, this technique should be insensitive to the ordering of instances and produce deterministic accurate results. Therefore, this paper presents a density-based clustering approach using Density-Based Spatial Clustering of Applications with Noises (DBSCAN), to classify web visitors of two real large data sets. We propose two new features based on the behavioral patterns of visitors to describe them. What's more, we consider 12 common features and use the significance of the difference test (T-test) to reduce the dimensions and overcome one of the disadvantages of DBSCAN. Based on the supervised evaluation metrics, the proposed algorithm has the 95% of Jaccard metric and produces two clusters having the entropy and purity rates of 0.024 and 0.97, respectively. Furthermore, from the standpoint of clustering quality and accuracy, the proposed method performs better than state-of-the-art algorithms. Finally, it can be concluded that some known web robots through imitating human users make it difficult to be identified. (C) 2014 ISC. All rights reserved.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [1] A density based clustering approach for web robot detection
    Zabihi, Mahdieh
    Jahan, Majid Vafaei
    Hamidzadeh, Javad
    Proceedings of the 4th International Conference on Computer and Knowledge Engineering, ICCKE 2014, 2014, : 23 - 28
  • [2] A Density Based Clustering Approach for Web Robot Detection
    Zabihi, Mahdieh
    Jahan, Majid Vafaei
    Hamidzadeh, Javad
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 23 - 28
  • [3] A Comparison of Web Robot and Human Requests
    Doran, Derek
    Morillo, Kevin
    Gokhale, Swapna S.
    2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2013, : 1374 - 1380
  • [4] Jini web server clustering
    De Milato, F
    Parodi, G
    JAVA/JINI TECHNOLOGIES, 2001, 4521 : 146 - 153
  • [5] Online Identification of Illegitimate Web Server Requests
    Dalai, Asish Kumar
    Jena, Sanjay Kumar
    COMPUTER NETWORKS AND INTELLIGENT COMPUTING, 2011, 157 : 123 - 131
  • [6] Scalable Web server clustering technologies
    Schroeder, T
    Goddard, S
    Ramamurthy, B
    IEEE NETWORK, 2000, 14 (03): : 38 - 45
  • [7] Allocation strategies of user requests in web server clusters
    Krawczyk, H
    Urbaniak, A
    PAR ELEC 2002: INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING, 2002, : 217 - 221
  • [8] AN EFFICIENT DENSITY BASED ANT COLONY APPROACH ON WEB DOCUMENT CLUSTERING
    Reka, M.
    JOURNAL OF APPLIED MATHEMATICS & INFORMATICS, 2023, 41 (06): : 1327 - 1339
  • [9] Performance Evaluation of Density-Based Clustering Methods for Categorizing Web Robot Sessions
    Sisodia, Dilip Singh
    Verma, Namrata
    2018 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATION AND TELECOMMUNICATION (ICACAT), 2018,
  • [10] Detection and Confirmation of Web Robot Requests for Cleaning the Voluminous Web Log Data
    Sardar, Tanvir Habib
    Ansari, Zahid
    2014 INTERNATIONAL CONFERENCE ON THE IMPACT OF E-TECHNOLOGY ON US (IMPETUS), 2014, : 13 - 19