Keyword query based focused Web crawler

被引:28
|
作者
Kumar, Manish [1 ]
Bindal, Ankit [1 ]
Gautam, Robin [1 ]
Bhatia, Rajesh [1 ]
机构
[1] PEC Univ Technol, Chandigarh 160012, India
关键词
Web crawler; Information retrieval; Focused Web Crawler; Query based crawler;
D O I
10.1016/j.procs.2017.12.075
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Finding information on Web is a difficult and challenging task because of the extremely large volume of data. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages present on Web. This paper proposes a query based crawler where a set of keywords relevant to the topic of interest of the user is used to shoot queries on search interface. These search interfaces are found on webpage of the website corresponding to seed URL. This helps crawler to get most relevant links from the domain without actually going in depth of that domain. No existing focused crawling approach uses query based approach to find webpages of interest. In the proposed crawler, list of keywords is passed to the search query interfaces found on the websites. The proposed work will give the most relevant information based on the keywords in a particular domain without actually crawling through many irrelevant links in between them. (C) 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Conference on Smart Computing and Communications.
引用
收藏
页码:584 / 590
页数:7
相关论文
共 50 条
  • [31] Ontology-based focused crawler
    Lu, Gechao
    Zuo, Wanli
    Zhang, Aiqi
    Wang, Ying
    Ji, Wenyan
    Journal of Information and Computational Science, 2010, 7 (02): : 577 - 584
  • [32] Template-Driven Semantic Parsing for Focused Web Crawler
    Blinkiewicz, Michal
    Galler, Mariusz
    Szwabe, Andrzej
    SEMANTIC TECHNOLOGY (JIST 2014), 2015, 8943 : 351 - 358
  • [33] Extraction of Query Interfaces for Domain-Specific Hidden Web Crawler
    Gupta, Nupur
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2016, 16 (02): : 124 - 127
  • [34] Query Reformulation Using Ontology and Keyword for Durian Web Search
    Azizan, Azilawati
    Abu Bakar, Zainab
    Noah, Shahrul Azman
    2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 94 - 100
  • [35] ANTON Framework Based on Semantic Focused Crawler to Support Web Crime Mining Using SVM
    Hosseinkhani J.
    Taherdoost H.
    Keikhaee S.
    Annals of Data Science, 2021, 8 (2) : 227 - 240
  • [36] A Novel Focused Crawler Based on Breadcrumb Navigation
    Ying, Lizhi
    Zhou, Xinhao
    Yuan, Jian
    Huang, Yongfeng
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2012, PT II, 2012, 7332 : 264 - 271
  • [37] Focused image crawler based on mobile agent
    Lin Kunhui
    Zhang Lei
    Zhou Changle
    Ni Ziwei
    Wu Qingfeng
    Advanced Computer Technology, New Education, Proceedings, 2007, : 808 - 811
  • [38] A Focused Crawler Based on Naive Bayes Classifier
    Wang, Wenxian
    Chen, Xingshu
    Zou, Yongbin
    Wang, Haizhou
    Dai, Zongkun
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 517 - 521
  • [39] An intelligent focused crawler based on genetic algorithm
    Yu, Chun
    Du, Yajun
    Liu, Wenjun
    Journal of Computational Information Systems, 2014, 10 (18): : 8059 - 8066
  • [40] The Research of Ontology-Based Focused Crawler
    Wu, Cong-Cong
    Zhao, Jian-li
    Ma, Hui-lin
    2012 7TH INTERNATIONAL CONFERENCE ON SYSTEM OF SYSTEMS ENGINEERING (SOSE), 2012, : 736 - 738