Keyword query based focused Web crawler

被引：28

作者：

Kumar, Manish ^{[1
]}

Bindal, Ankit ^{[1
]}

Gautam, Robin ^{[1
]}

Bhatia, Rajesh ^{[1
]}

机构：

[1] PEC Univ Technol, Chandigarh 160012, India

来源：

6TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS | 2018年 / 125卷

关键词：

Web crawler; Information retrieval; Focused Web Crawler; Query based crawler;

D O I：

10.1016/j.procs.2017.12.075

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Finding information on Web is a difficult and challenging task because of the extremely large volume of data. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages present on Web. This paper proposes a query based crawler where a set of keywords relevant to the topic of interest of the user is used to shoot queries on search interface. These search interfaces are found on webpage of the website corresponding to seed URL. This helps crawler to get most relevant links from the domain without actually going in depth of that domain. No existing focused crawling approach uses query based approach to find webpages of interest. In the proposed crawler, list of keywords is passed to the search query interfaces found on the websites. The proposed work will give the most relevant information based on the keywords in a particular domain without actually crawling through many irrelevant links in between them. (C) 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Conference on Smart Computing and Communications.

引用

页码：584 / 590

页数：7

共 50 条

[11] A novel incremental parallel web crawler based on focused crawling
Huang, Qiuyan
Li, Qingzhong
Yan, Zhongmin
Fu, Hong
Journal of Computational Information Systems, 2013, 9 (06): : 2461 - 2469
[12] LSCrawler: A framework for an enhanced focused web crawler based on link semantics
Yuvarani, M.
Iyengar, N. Ch. S. N.
Kannan, A.
2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 794 - 797
[13] Web page sorting algorithm based on query keyword distance relation
Yang, Han
Cui, HongGang
Tang, Hao
GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
[14] An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm
Prabha, K. S. Sakunthala
Mahesh, C.
Raja, S. P.
CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (02) : 105 - 120
[15] Weakly supervised learning for an effective focused web crawler
Dhanith, P. R. Joe
Saeed, Khalid
Rohith, G.
Raja, S. P.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 132
[16] iSurfer: a focused Web crawler based on incremental learning from positive samples
Ye, YM
Ma, FY
Lu, YM
Chiu, M
Huang, J
ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 122 - 134
[17] A Survey about Algorithms Utilized by Focused Web Crawler
Yong-Bin Yu
Shi-Lei Huang
Nyima Tashi
Huan Zhang
Fei Lei
Lin-Yang Wu
Journal of Electronic Science and Technology, 2018, 16 (02) : 129 - 138
[18] Keyword Aggregate Query Based on Query Template
Zhu, Bin
Yuan, Fang
Wang, Yu
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION APPLICATIONS (ICCIA 2012), 2012, : 715 - 720
[19] A survey about algorithms utilized by focused web crawler
Yu Y.-B.
Huang S.-L.
Tashi N.
Zhang H.
Lei F.
Wu L.-Y.
Journal of Electronic Science and Technology, 2018, 16 (02) : 129 - 138
[20] wHunter: A focused web crawler - A tool for digital library
Huang, Y
Ye, YM
DIGITAL LIBRARIES: INTERNATIONAL COLLABORATION AND CROSS-FERTILIZATION, PROCEEDINGS, 2004, 3334 : 519 - 522

← 1 2 3 4 5 →