A Framework of a Hybrid Focused Web Crawler

被引:0
|
作者
Sun, Yixue [1 ]
Jin, Peiquan [1 ]
Yue, Lihua [1 ]
机构
[1] Univ Sci & Technol China, Dept Comp Sci & Technol, Hefei, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Because of the complex Web structure, most approaches of focused crawling employ a local search algorithm, which will only search pages in a sub-graph of the Web. And the multi-topic feature of Web pages makes it difficult to determine the relevance of a Web page to a given topic. Towards those two issues, in this paper we present a new hybrid approach to focused crawling, which is based on meta-search and VIPS (VIsion based Page Segmentation) algorithm. We use meta-search to achieve a wider crawling range than traditional local search algorithm. Besides, in order to obtain better recall and precision, we use VIPS-based algorithm for the relevance computation of a Web page, which first partitions a Web page into a set of blocks that reflect the semantic structure of the page. The system architecture of hybrid focused crawler is discussed after a short review on related work, and then we present the framework of the hybrid focused crawling approach.
引用
收藏
页码:146 / 149
页数:4
相关论文
共 50 条
  • [1] An improved focused web crawler based on hybrid similarity
    Shang, Songtao
    Wu, Huaiguang
    Ma, Jiangtao
    [J]. International Journal of Performability Engineering, 2019, 15 (10) : 2645 - 2656
  • [2] Keyword Focused Web Crawler
    Agre, Gunjan H.
    Mahajan, Nikita V.
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1089 - 1092
  • [3] LSCrawler: A framework for an enhanced focused web crawler based on link semantics
    Yuvarani, M.
    Iyengar, N. Ch. S. N.
    Kannan, A.
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 794 - 797
  • [4] Smart Focused Web Crawler for Hidden Web
    Kaur, Sawroop
    Geetha, G.
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 419 - 427
  • [5] A Focused Crawler for Dark Web Forums
    Fu, Tianjun
    Abbasi, Ahmed
    Chen, Hsinchun
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (06): : 1213 - 1231
  • [6] An algorithm OFC for the focused web crawler
    Zhu, Qiang
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 4059 - 4063
  • [7] Focused Web Crawler for Indonesian Recipes
    Alfarisy, Gusti Ahmad Fanshuri
    Bachtiar, Fitra A.
    [J]. 2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 196 - 202
  • [8] An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm
    Prabha, K. S. Sakunthala
    Mahesh, C.
    Raja, S. P.
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (02) : 105 - 120
  • [9] A Framework of Deep Web Crawler
    Xiang Peisu
    Tian Ke
    Huang Qinzhen
    [J]. PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 5, 2008, : 582 - +
  • [10] Keyword query based focused Web crawler
    Kumar, Manish
    Bindal, Ankit
    Gautam, Robin
    Bhatia, Rajesh
    [J]. 6TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS, 2018, 125 : 584 - 590