A Semantic Focused Web Crawler Based on a Knowledge Representation Schema

被引:10
|
作者
Hernandez, Julio [1 ]
Marin-Castro, Heidy M. [2 ]
Morales-Sandoval, Miguel [1 ]
机构
[1] Cinvestav Tamaulipas, Cd Victoria 87130, Tamps, Mexico
[2] Catedras CONACYT Univ Autonoma Tamaulipas, Cd Victoria 87000, Tamps, Mexico
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 11期
关键词
crawling; semantic focused web crawler; knowledge representation schema; web pages; similarity;
D O I
10.3390/app10113837
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The Web has become the main source of information in the digital world, expanding to heterogeneous domains and continuously growing. By means of a search engine, users can systematically search over the web for particular information based on a text query, on the basis of a domain-unaware web search tool that maintains real-time information. One type of web search tool is the semantic focused web crawler (SFWC); it exploits the semantics of the Web based on some ontology heuristics to determine which web pages belong to the domain defined by the query. An SFWC is highly dependent on the ontological resource, which is created by domain human experts. This work presents a novel SFWC based on a generic knowledge representation schema to model the crawler's domain, thus reducing the complexity and cost of constructing a more formal representation as the case when using ontologies. Furthermore, a similarity measure based on the combination of the inverse document frequency (IDF) metric, standard deviation, and the arithmetic mean is proposed for the SFWC. This measure filters web page contents in accordance with the domain of interest during the crawling task. A set of experiments were run over the domains of computer science, politics, and diabetes to validate and evaluate the proposed novel crawler. The quantitative (harvest ratio) and qualitative (Fleiss' kappa) evaluations demonstrate the suitability of the proposed SFWC to crawl the Web using a knowledge representation schema instead of a domain ontology.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm
    Prabha, K. S. Sakunthala
    Mahesh, C.
    Raja, S. P.
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (02) : 105 - 120
  • [2] Template-Driven Semantic Parsing for Focused Web Crawler
    Blinkiewicz, Michal
    Galler, Mariusz
    Szwabe, Andrzej
    [J]. SEMANTIC TECHNOLOGY (JIST 2014), 2015, 8943 : 351 - 358
  • [3] Semantic Web and knowledge representation
    Zarri, GP
    [J]. 13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 75 - 79
  • [4] Framework and Schema for Semantic Web Knowledge Bases
    McGlothlin, James P.
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1992 - 1993
  • [5] Keyword query based focused Web crawler
    Kumar, Manish
    Bindal, Ankit
    Gautam, Robin
    Bhatia, Rajesh
    [J]. 6TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS, 2018, 125 : 584 - 590
  • [6] LEARNING-based Focused WEB Crawler
    Kumar, Naresh
    Aggarwal, Dhruv
    [J]. IETE JOURNAL OF RESEARCH, 2023, 69 (04) : 2037 - 2045
  • [7] An Ontology-Based Crawler for the Semantic Web
    Van de Maele, Felix
    Spyns, Peter
    Meersman, Robert
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008 WORKSHOPS, 2008, 5333 : 1056 - +
  • [8] ANTON Framework Based on Semantic Focused Crawler to Support Web Crime Mining Using SVM
    Hosseinkhani J.
    Taherdoost H.
    Keikhaee S.
    [J]. Annals of Data Science, 2021, 8 (2) : 227 - 240
  • [9] Keyword Focused Web Crawler
    Agre, Gunjan H.
    Mahajan, Nikita V.
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1089 - 1092
  • [10] Knowledge representation, ontologies, and the semantic web
    Terzi, E
    Vakali, A
    Hacid, MS
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, 2003, 2642 : 382 - 387