Keyword query based focused Web crawler

被引:28
|
作者
Kumar, Manish [1 ]
Bindal, Ankit [1 ]
Gautam, Robin [1 ]
Bhatia, Rajesh [1 ]
机构
[1] PEC Univ Technol, Chandigarh 160012, India
关键词
Web crawler; Information retrieval; Focused Web Crawler; Query based crawler;
D O I
10.1016/j.procs.2017.12.075
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Finding information on Web is a difficult and challenging task because of the extremely large volume of data. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages present on Web. This paper proposes a query based crawler where a set of keywords relevant to the topic of interest of the user is used to shoot queries on search interface. These search interfaces are found on webpage of the website corresponding to seed URL. This helps crawler to get most relevant links from the domain without actually going in depth of that domain. No existing focused crawling approach uses query based approach to find webpages of interest. In the proposed crawler, list of keywords is passed to the search query interfaces found on the websites. The proposed work will give the most relevant information based on the keywords in a particular domain without actually crawling through many irrelevant links in between them. (C) 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Conference on Smart Computing and Communications.
引用
收藏
页码:584 / 590
页数:7
相关论文
共 50 条
  • [11] A novel incremental parallel web crawler based on focused crawling
    Huang, Qiuyan
    Li, Qingzhong
    Yan, Zhongmin
    Fu, Hong
    Journal of Computational Information Systems, 2013, 9 (06): : 2461 - 2469
  • [12] LSCrawler: A framework for an enhanced focused web crawler based on link semantics
    Yuvarani, M.
    Iyengar, N. Ch. S. N.
    Kannan, A.
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 794 - 797
  • [13] Web page sorting algorithm based on query keyword distance relation
    Yang, Han
    Cui, HongGang
    Tang, Hao
    GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [14] An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm
    Prabha, K. S. Sakunthala
    Mahesh, C.
    Raja, S. P.
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (02) : 105 - 120
  • [15] Weakly supervised learning for an effective focused web crawler
    Dhanith, P. R. Joe
    Saeed, Khalid
    Rohith, G.
    Raja, S. P.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
  • [16] iSurfer: a focused Web crawler based on incremental learning from positive samples
    Ye, YM
    Ma, FY
    Lu, YM
    Chiu, M
    Huang, J
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 122 - 134
  • [17] A Survey about Algorithms Utilized by Focused Web Crawler
    Yong-Bin Yu
    Shi-Lei Huang
    Nyima Tashi
    Huan Zhang
    Fei Lei
    Lin-Yang Wu
    Journal of Electronic Science and Technology, 2018, 16 (02) : 129 - 138
  • [18] Keyword Aggregate Query Based on Query Template
    Zhu, Bin
    Yuan, Fang
    Wang, Yu
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 715 - 720
  • [19] A survey about algorithms utilized by focused web crawler
    Yu Y.-B.
    Huang S.-L.
    Tashi N.
    Zhang H.
    Lei F.
    Wu L.-Y.
    Journal of Electronic Science and Technology, 2018, 16 (02) : 129 - 138
  • [20] wHunter: A focused web crawler - A tool for digital library
    Huang, Y
    Ye, YM
    DIGITAL LIBRARIES: INTERNATIONAL COLLABORATION AND CROSS-FERTILIZATION, PROCEEDINGS, 2004, 3334 : 519 - 522