A Density Based Clustering Approach to Distinguish Between Web Robot and Human Requests to a Web Server

被引:0
|
作者
Zabihi, Mahdieh [1 ]
Jahan, Majid Vafaei [2 ]
Hamidzadeh, Javad [3 ]
机构
[1] Imam Reza Int Univ, Mashhad, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran
[3] Sadjad Univ Technol, Fac Comp Engn & Informat Technol, Mashhad, Iran
关键词
Behavioral Patterns of Web Visitors; DBSCAN; Density Based Clustering; Significance of the Difference Test; Web Robots;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data sets of web visitors in a reasonable amount of time. Moreover, this technique should be insensitive to the ordering of instances and produce deterministic accurate results. Therefore, this paper presents a density-based clustering approach using Density-Based Spatial Clustering of Applications with Noises (DBSCAN), to classify web visitors of two real large data sets. We propose two new features based on the behavioral patterns of visitors to describe them. What's more, we consider 12 common features and use the significance of the difference test (T-test) to reduce the dimensions and overcome one of the disadvantages of DBSCAN. Based on the supervised evaluation metrics, the proposed algorithm has the 95% of Jaccard metric and produces two clusters having the entropy and purity rates of 0.024 and 0.97, respectively. Furthermore, from the standpoint of clustering quality and accuracy, the proposed method performs better than state-of-the-art algorithms. Finally, it can be concluded that some known web robots through imitating human users make it difficult to be identified. (C) 2014 ISC. All rights reserved.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [41] A Clustering Based Scalable Hybrid Approach for Web Page Recommendation
    Sharif, Mohammad Amir
    Raghavan, Vijay V.
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [42] A semantic web services clustering and retrieving approach based on ontology
    Li, Ying
    Dong, Baotian
    SIXTH WUHAN INTERNATIONAL CONFERENCE ON E-BUSINESS, VOLS 1-4: MANAGEMENT CHALLENGES IN A GLOBAL WORLD, 2007, : 1455 - 1460
  • [43] PARTITIONING WEB APPLICATIONS BETWEEN THE SERVER AND THE CLIENT
    Kuuskeri, Janne
    Mikkonen, Tommi
    JOURNAL OF WEB ENGINEERING, 2010, 9 (03): : 207 - 226
  • [44] A generalization-based approach to clustering of web usage sessions
    Fu, YJ
    Sandhu, K
    Shih, MY
    WEB USAGE ANALYSIS AND USER PROFILING, 2000, 1836 : 21 - 38
  • [45] A Clustering-based Approach to Web Image Context Extraction
    Alcic, Sadet
    Conrad, Stefan
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCES ON ADVANCES IN MULTIMEDIA (MMEDIA 2011), 2011, : 74 - 79
  • [46] Improving density-based methods for hierarchical clustering of web pages
    Chehreghani, Morteza Haghir
    Abolhassani, Hassan
    Chehreghani, Mostafa Haghir
    DATA & KNOWLEDGE ENGINEERING, 2008, 67 (01) : 30 - 50
  • [47] A novel approach for density-based optimal semantic clustering of web objects via identification of kingpins
    Setia S.
    Verma J.
    Duhan N.
    Recent Advances in Computer Science and Communications, 2021, 14 (03) : 710 - 723
  • [48] Web Robot Detection: A Semantic Approach
    Lagopoulos, Athanasios
    Tsoumakas, Grigorios
    Papadopoulos, Georgios
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 968 - 974
  • [49] Characterizing web user accesses: A transactional approach to web log clustering
    Giannotti, F
    Gozzi, C
    Manco, G
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, PROCEEDINGS, 2002, : 312 - 317
  • [50] Information Seeking on the Web: An Integrated Approach Based on Human Collaboration and Web 2.0
    Jiang, Jinlei
    Wu, Yongwei
    Yang, Guangwen
    Zheng, Weimin
    2009 1ST IEEE SYMPOSIUM ON WEB SOCIETY, PROCEEDINGS, 2009, : 108 - 112