An Integrated Crawling Strategy for Domain-specific Resource Discovery

被引:1
|
作者
Yuan, Fuyong [1 ]
Yin, Chunxia [1 ]
Liu, Jian [1 ]
Zhang, Yulian [1 ]
机构
[1] Yanshan Univ, Coll Informat Sci & Engn, Qinhuangdao, Peoples R China
关键词
resource discovery; topic-specific crawler; URL ordering;
D O I
10.1109/SITIS.2007.70
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Topic-specific crawler aims to selectively seek out pages that are relevant to a pre-defined set of topics, rather than to exploit all regions of the Web. It is important for domain-specific resource discovery. Topic-specific crawlers yield good recall as well as good precision by restricting themselves to a specific domain from web pages. In this paper, we present an integrated topic-specific crawling strategy. The main features of the crawling process consist of a topic specification module that mediates between users and search engines to identify starting URLs by computing the hub score using BHIST algorithm, and a URL ordering algorithm that combines features of several previous approaches. Experimental results indicate that the new crawling method has better performance, and it was able to fetch higher topic relevant information.
引用
下载
收藏
页码:329 / 336
页数:8
相关论文
共 50 条
  • [1] Adaptive topical web crawling for domain-specific resource discovery guided by link-context
    Peng, Tao
    He, Fengling
    Zuo, Wanli
    Zhang, Changli
    MICAI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4293 : 963 - +
  • [2] Crawling for domain-specific Hidden Web resources
    Bergholz, A
    Chidlovskii, B
    FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2003, : 125 - 133
  • [3] DSDD: Domain-Specific Dataset Discovery on the Web
    Zhang, Haoxiang
    Santos, Aecio
    Freire, Juliana
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2527 - 2536
  • [4] Domain-Specific Deep Web Sources Discovery
    Wang, Ying
    Zuo, Wanli
    Peng, Tao
    He, Fengling
    ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 5, PROCEEDINGS, 2008, : 202 - 206
  • [5] Bootstrapping Domain-Specific Content Discovery on the Web
    Kien Pham
    Santos, Aecio
    Freire, Juliana
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 1476 - 1486
  • [6] Domain-Specific Entity Discovery and Linking Task
    Yang, Tao
    Zhang, Feng
    Li, Xiao
    Jia, Qianghuai
    Wang, Ce
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: SEMANTIC, KNOWLEDGE, AND LINKED BIG DATA, 2016, 650 : 214 - 218
  • [7] Domain-specific optimization strategy for skeleton programs
    Emoto, Kento
    Matsuzaki, Kiminori
    Hu, Zhenjiang
    Takeichi, Masato
    EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 705 - +
  • [8] A Test of the Domain-Specific Acculturation Strategy Hypothesis
    Miller, Matthew J.
    Yang, Minji
    Lim, Robert H.
    Hui, Kayi
    Choi, Na-Yeun
    Fan, Xiaoyan
    Lin, Li-Ling
    Grome, Rebekah E.
    Farrell, Jerome A.
    Blackmon, Sha'kema
    CULTURAL DIVERSITY & ETHNIC MINORITY PSYCHOLOGY, 2013, 19 (01): : 1 - 12
  • [9] An automatic label extraction technique for domain-specific hidden web crawling (LEHW)
    El-Desouky, Ali I.
    Ali, Hesham A.
    El-Ghamrawy, Sally M.
    2006 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2006, : 454 - +
  • [10] Prequery Discovery of Domain-Specific Query Forms: A Survey
    Moraes, Mauricio C.
    Heuser, Carlos A.
    Moreira, Viviane P.
    Barbosa, Denilson
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (08) : 1830 - 1848