Keyword query based focused Web crawler

被引:27
|
作者
Kumar, Manish [1 ]
Bindal, Ankit [1 ]
Gautam, Robin [1 ]
Bhatia, Rajesh [1 ]
机构
[1] PEC Univ Technol, Chandigarh 160012, India
关键词
Web crawler; Information retrieval; Focused Web Crawler; Query based crawler;
D O I
10.1016/j.procs.2017.12.075
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Finding information on Web is a difficult and challenging task because of the extremely large volume of data. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages present on Web. This paper proposes a query based crawler where a set of keywords relevant to the topic of interest of the user is used to shoot queries on search interface. These search interfaces are found on webpage of the website corresponding to seed URL. This helps crawler to get most relevant links from the domain without actually going in depth of that domain. No existing focused crawling approach uses query based approach to find webpages of interest. In the proposed crawler, list of keywords is passed to the search query interfaces found on the websites. The proposed work will give the most relevant information based on the keywords in a particular domain without actually crawling through many irrelevant links in between them. (C) 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Conference on Smart Computing and Communications.
引用
收藏
页码:584 / 590
页数:7
相关论文
共 50 条
  • [1] Keyword Focused Web Crawler
    Agre, Gunjan H.
    Mahajan, Nikita V.
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1089 - 1092
  • [2] An Improved Focused Crawler Based on Text Keyword Extraction
    Zheng, Zhang
    Qian, Du
    [J]. PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 386 - 390
  • [3] LEARNING-based Focused WEB Crawler
    Kumar, Naresh
    Aggarwal, Dhruv
    [J]. IETE JOURNAL OF RESEARCH, 2023, 69 (04) : 2037 - 2045
  • [4] An improved focused web crawler based on hybrid similarity
    Shang, Songtao
    Wu, Huaiguang
    Ma, Jiangtao
    [J]. International Journal of Performability Engineering, 2019, 15 (10) : 2645 - 2656
  • [5] Smart Focused Web Crawler for Hidden Web
    Kaur, Sawroop
    Geetha, G.
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 419 - 427
  • [6] A Focused Crawler for Dark Web Forums
    Fu, Tianjun
    Abbasi, Ahmed
    Chen, Hsinchun
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (06): : 1213 - 1231
  • [7] A Framework of a Hybrid Focused Web Crawler
    Sun, Yixue
    Jin, Peiquan
    Yue, Lihua
    [J]. 2008 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION AND NETWORKING SYMPOSIA, VOLS 1-5, PROCEEDINGS, 2008, : 146 - 149
  • [8] An algorithm OFC for the focused web crawler
    Zhu, Qiang
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 4059 - 4063
  • [9] Focused Web Crawler for Indonesian Recipes
    Alfarisy, Gusti Ahmad Fanshuri
    Bachtiar, Fitra A.
    [J]. 2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 196 - 202
  • [10] A Semantic Focused Web Crawler Based on a Knowledge Representation Schema
    Hernandez, Julio
    Marin-Castro, Heidy M.
    Morales-Sandoval, Miguel
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (11):