An architecture for a focused trend parallel Web crawler with the application of clickstrearn analysis

被引:30
|
作者
Ahmadi-Abkenari, Fatemeh [1 ]
Selamat, Ali [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp Sci & Informat Syst, Dept Software Engn, Software Engn Res Grp, Johor Baharu, Malaysia
关键词
Clickstream analysis; Focused crawlers; Parallel crawlers; Web data management; Web page importance metrics;
D O I
10.1016/j.ins.2011.08.022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:266 / 281
页数:16
相关论文
共 50 条
  • [1] Architecture for a Parallel Focused Crawler for Clickstream Analysis
    Selamat, Ali
    Ahmadi-Abkenari, Fatemeh
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2011, PT I, 2011, 6591 : 27 - 35
  • [2] Parallel Web Crawler Architecture for Clickstream Analysis
    Ahmadi-Abkenari, Fatemeh
    Selamat, Ali
    KNOWLEDGE TECHNOLOGY, 2012, 295 : 123 - 132
  • [3] Parallel crawler architecture and web page change detection
    Computer Science and Information Technology, Jaypee Institute of Information Technology University, Noida, India
    WSEAS Trans. Comput., 2008, 7 (929-940):
  • [4] A novel incremental parallel web crawler based on focused crawling
    Huang, Qiuyan
    Li, Qingzhong
    Yan, Zhongmin
    Fu, Hong
    Journal of Computational Information Systems, 2013, 9 (06): : 2461 - 2469
  • [5] Keyword Focused Web Crawler
    Agre, Gunjan H.
    Mahajan, Nikita V.
    2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1089 - 1092
  • [6] Smart Focused Web Crawler for Hidden Web
    Kaur, Sawroop
    Geetha, G.
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 419 - 427
  • [7] A Framework of a Hybrid Focused Web Crawler
    Sun, Yixue
    Jin, Peiquan
    Yue, Lihua
    2008 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION AND NETWORKING SYMPOSIA, VOLS 1-5, PROCEEDINGS, 2008, : 146 - 149
  • [8] A Focused Crawler for Dark Web Forums
    Fu, Tianjun
    Abbasi, Ahmed
    Chen, Hsinchun
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (06): : 1213 - 1231
  • [9] An algorithm OFC for the focused web crawler
    Zhu, Qiang
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 4059 - 4063
  • [10] Focused Web Crawler for Indonesian Recipes
    Alfarisy, Gusti Ahmad Fanshuri
    Bachtiar, Fitra A.
    2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 196 - 202