Keyword query based focused Web crawler

被引:28
|
作者
Kumar, Manish [1 ]
Bindal, Ankit [1 ]
Gautam, Robin [1 ]
Bhatia, Rajesh [1 ]
机构
[1] PEC Univ Technol, Chandigarh 160012, India
关键词
Web crawler; Information retrieval; Focused Web Crawler; Query based crawler;
D O I
10.1016/j.procs.2017.12.075
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Finding information on Web is a difficult and challenging task because of the extremely large volume of data. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages present on Web. This paper proposes a query based crawler where a set of keywords relevant to the topic of interest of the user is used to shoot queries on search interface. These search interfaces are found on webpage of the website corresponding to seed URL. This helps crawler to get most relevant links from the domain without actually going in depth of that domain. No existing focused crawling approach uses query based approach to find webpages of interest. In the proposed crawler, list of keywords is passed to the search query interfaces found on the websites. The proposed work will give the most relevant information based on the keywords in a particular domain without actually crawling through many irrelevant links in between them. (C) 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 6th International Conference on Smart Computing and Communications.
引用
收藏
页码:584 / 590
页数:7
相关论文
共 50 条
  • [41] An architecture for a focused trend parallel Web crawler with the application of clickstrearn analysis
    Ahmadi-Abkenari, Fatemeh
    Selamat, Ali
    INFORMATION SCIENCES, 2012, 184 (01) : 266 - 281
  • [42] Designing a Modular and Distributed Web Crawler Focused on Unstructured Cybersecurity Intelligence
    Jenkins, Donovan
    Liebrock, Lorie M.
    Urias, Vince
    2021 INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2021,
  • [43] Ontology-based web crawler
    Ganesh, S
    Jayaraj, M
    Kalyan, V
    Murthy, S
    Aghila, G
    ITCC 2004: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 2, PROCEEDINGS, 2004, : 337 - 341
  • [44] Methodologies for crawler based Web surveys
    Thelwall, M
    INTERNET RESEARCH, 2002, 12 (02) : 124 - 138
  • [45] LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces
    Mesquita, Filipe
    da Silva, Altigran S.
    de Moura, Edleno S.
    Calado, Pavel
    Laender, Alberto H. F.
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (04) : 983 - 1004
  • [46] A novel focused crawler combining Web space evolution and domain ontology
    Liu, Jingfa
    Li, Xin
    Zhang, Qiansheng
    Zhong, Guo
    KNOWLEDGE-BASED SYSTEMS, 2022, 243
  • [47] An ontology-supported web focused-crawler for Java programs
    Dept. of Computer and Communication Engineering, St. John's University, Taiwan
    不详
    IEEE Int. Conf. Ubi-Media Comput., U-Media, (266-271):
  • [48] Focused crawler for events
    Farag, Mohamed M. G.
    Lee, Sunshin
    Fox, Edward A.
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2018, 19 (01) : 3 - 19
  • [49] QFAS-KE: Query focused answer summarization using keyword extraction
    Goyal, Rupali
    Kumar, Parteek
    Singh, V. P.
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (04)
  • [50] Query-topic focused web pages summarization
    Yoo, Seung Yeol
    Hoffmann, Achim
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 533 - 543