Using HMM to learn user browsing patterns for focused Web crawling

被引:44
|
作者
Liu, Hongyu
Janssen, Jeannette
Millos, Evangelos
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 1W5, Canada
[2] Dalhousie Univ, Dept Math & Stat, Halifax, NS B3H 1W5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
focused crawling; Web searching; relevance modelling; user modelling; pattern learning; Hidden Markov models; World Wide Web; Web Graph;
D O I
10.1016/j.datak.2006.01.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A focused crawler is designed to traverse the Web to gather documents on a specific topic. It can be used to build domain-specific Web search portals and online personalized search tools. To estimate the relevance of a newly seen URL, it must use information gleaned from previously crawled page sequences. In this paper, we present a new approach for prediction of the links leading to relevant pages based on a Hidden Markov Model (HMM). The system consists of three stages: user data collection, user modelling via sequential pattern learning, and focused crawling. In particular, we first collect the Web pages visited during a user browsing session. These pages are clustered, and the link structure among pages from different clusters is then used to learn page sequences that are likely to lead to target pages. The learning is performed using HMM. During crawling, the priority of links to follow is based on a learned estimate of how likely the page is to lead to a target page. We compare the performance with Context-Graph crawling and Best-First crawling. Our experiments demonstrate that this approach performs better than Context-Graph crawling and Best-First crawling. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:270 / 291
页数:22
相关论文
共 50 条
  • [1] Focused crawling by learning HMM from user's topic-specific browsing
    Liu, HY
    Milios, E
    Janssen, J
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 732 - 735
  • [2] Fast prediction of web user browsing behaviours using most interesting patterns
    Sisodia, Dilip Singh
    Khandal, Vijay
    Singhal, Riya
    JOURNAL OF INFORMATION SCIENCE, 2018, 44 (01) : 74 - 90
  • [3] Focused Web Crawling Algorithms
    Amrin, Andas
    Xia, Chunlei
    Dai, Shuguang
    JOURNAL OF COMPUTERS, 2015, 10 (04) : 245 - 251
  • [4] Focused crawling for the hidden web
    Liakos, Panagiotis
    Ntoulas, Alexandros
    Labrinidis, Alexandros
    Delis, Alex
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (04): : 605 - 631
  • [5] Synonyms extraction using Web content focused crawling
    Chen, Chien-Hsing
    Hsu, Chung-Chian
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 286 - 297
  • [6] Focused crawling for the hidden web
    Panagiotis Liakos
    Alexandros Ntoulas
    Alexandros Labrinidis
    Alex Delis
    World Wide Web, 2016, 19 : 605 - 631
  • [7] Focused crawling of tagged web resources using ontology
    Bedi, Punam
    Thukral, Anjali
    Banati, Hema
    COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (02) : 613 - 628
  • [8] EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING
    Pirkola, Ari
    Talvensaari, Tuomas
    WEBIST 2009: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2009, : 376 - 381
  • [9] Web Design Patterns: Investigating User Goals and Browsing Strategies
    Diaz, Paloma
    Beth Rosson, Mary
    Aedo, Ignacio
    Carro, John. M.
    END-USER DEVELOPMENT, PROCEEDINGS, 2009, 5435 : 186 - +
  • [10] Using evolution strategy for cooperative focused crawling on semantic web
    Jason J. Jung
    Neural Computing and Applications, 2009, 18 : 213 - 221