Discovery of Web robot sessions based on their navigational patterns

被引:137
|
作者
Tan, PN [1 ]
Kumar, V [1 ]
机构
[1] Univ Minnesota, Dept Comp Sci, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
web usage mining; web robot detection; classification; data mining;
D O I
10.1023/A:1013228602957
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web robots are software programs that automatically traverse the hyperlink structure of the World Wide Web in order to locate and retrieve information. There are many reasons why it is important to identify visits by Web robots and distinguish them from other users. First of all, e-commerce retailers are particularly concerned about the unauthorized deployment of robots for gathering business intelligence at their Web sites. In addition, Web robots tend to consume considerable network bandwidth at the expense of other users. Sessions due to Web robots also make it more difficult to perform clickstream analysis effectively on the Web data. Conventional techniques for detecting Web robots are often based on identifying the IP address and user agent of the Web clients. While these techniques are applicable to many well-known robots, they may not be sufficient to detect camouflaged and previously unknown robots. In this paper, we propose an alternative approach that uses the navigational patterns in the click-stream data to determine if it is due to a robot. Experimental results on our Computer Science department Web server logs show that highly accurate classification models can be built using this approach. We also show that these models are able to discover many camouflaged and previously unidentified robots.
引用
收藏
页码:9 / 35
页数:27
相关论文
共 50 条
  • [1] Discovery of Web Robot Sessions Based on their Navigational Patterns
    Pang-Ning Tan
    Vipin Kumar
    [J]. Data Mining and Knowledge Discovery, 2002, 6 : 9 - 35
  • [2] Fast discovery of structural navigational patterns from web user traversals
    Shan, MK
    Li, HF
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV, 2002, 4730 : 272 - 283
  • [3] Web Usage Mining: Discovery of the users' navigational patterns using SOM
    Etminani, Kobra
    Delui, Amin Rezaeian
    Yanehsari, Noorali Raeeji
    Rouhani, Modjtaba
    [J]. NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 244 - +
  • [4] Exploring navigational patterns on the Web
    Zimmerman, D
    Walls, P
    [J]. IEEE PROFESSIONAL COMMUNICATION SOCIETY INTERNATIONAL PROFESSIONAL COMMUNICATION CONFERENCE AND ACM SPECIAL INTEREST GROUP ON DOCUMENTATION CONFERENCE, 2000, : 581 - 591
  • [5] Web-based Remote Navigational Robot for Multiclass Human-Robot Interaction
    Yeoh, Kenny Ju Min
    Wong, Hwee Ling
    [J]. 2012 IEEE CONFERENCE ON SUSTAINABLE UTILIZATION AND DEVELOPMENT IN ENGINEERING AND TECHNOLOGY (STUDENT), 2012, : 170 - 175
  • [6] Web site personalization based on link analysis and navigational patterns
    Eirinaki, Magdalini
    Vazirgiannis, Michalis
    [J]. ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2007, 7 (04)
  • [7] Improving Web information systems with navigational patterns
    Rossi, G
    Schwabe, D
    Lyardet, F
    [J]. PROCEEDINGS OF THE EIGHTH INTERNATIONAL WORLD WIDE WEB CONFERENCE, 1999, : 589 - 600
  • [8] Improving Web information systems with navigational patterns
    Rossi, Gustavo
    Schwabe, Daniel
    Lyardet, Fernando
    [J]. Computer Networks, 1999, 31 (11): : 1667 - 1678
  • [9] Improving Web information systems with navigational patterns
    Rossi, G
    Schwabe, D
    Lyardet, F
    [J]. COMPUTER NETWORKS-THE INTERNATIONAL JOURNAL OF COMPUTER AND TELECOMMUNICATIONS NETWORKING, 1999, 31 (11-16): : 1667 - 1678
  • [10] Fuzzy c-Least Medians clustering for discovery of web access patterns from web user sessions data
    Ansari, Zahid
    Faizabadi, Ahmed Rimaz
    Afzal, Asif
    [J]. INTELLIGENT DATA ANALYSIS, 2017, 21 (03) : 553 - 575