DBpedia and the live extraction of structured data from Wikipedia

被引:54
|
作者
Morsey, Mohamed [1 ]
Lehmann, Jens
Auer, Soeren [1 ]
Stadler, Claus
Hellmann, Sebastian
机构
[1] Univ Leipzig, Dept Comp Sci, Res Grp, Leipzig, Germany
关键词
Knowledge extraction; RDF; Wikipedia; Triplestore; Knowledge management; Data management; Databases; Websites;
D O I
10.1108/00330331211221828
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues. Design/methodology/approach - Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors. Findings - During the realization of DBpedia-Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications - DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value - The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication.
引用
收藏
页码:157 / 181
页数:25
相关论文
共 50 条
  • [31] Improving the Extraction of Bilingual Terminology from Wikipedia
    Erdmann, Maike
    Nakayama, Kotaro
    Hara, Takahiro
    Nishio, Shojiro
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2009, 5 (04)
  • [32] Semantic Sense Extraction From Wikipedia Pages
    Pirrone, Roberto
    Pipitone, Arianna
    Russo, Giuseppe
    3RD INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION, 2010, : 543 - 547
  • [33] THE EXTRACTION OF LINE-STRUCTURED DATA FROM ENGINEERING DRAWINGS
    CLEMENT, TP
    PATTERN RECOGNITION, 1981, 14 (1-6) : 43 - 52
  • [34] Automated Extraction of Structured Data from the Social Network Instagram
    Frantis, Petr
    Bures, Michel
    Coufalikova, Aneta
    Klaban, Ivo
    PROCEEDINGS OF THE 23RD EUROPEAN CONFERENCE ON CYBER WARFARE AND SECURITY, ECCWS 2024, 2024, 23 : 157 - 164
  • [35] Interactive tuples extraction from semi-structured data
    Gilleron, Remi
    Marty, Patrick
    Tommasi, Marc
    Torre, Fabien
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 997 - 1004
  • [36] Interactive Data Extraction from Semi-Structured Text
    Broman, Per
    Thalheim, Bernhard
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII, 2012, 237 : 1 - 19
  • [37] Wikipedia - Sociological Live Coverage
    Danielewicz, Michal
    STUDIA SOCJOLOGICZNE, 2010, (02): : 127 - 156
  • [38] Automatic Question-Answering Based on Wikipedia Data Extraction
    Huang, Xiangzhou
    Wei, Baogang
    Zhang, Yin
    2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE), 2015, : 314 - 317
  • [39] MMKG: An approach to generate metallic materials knowledge graph based on DBpedia and Wikipedia
    Zhang, Xiaoming
    Liu, Xin
    Li, Xin
    Pan, Dongyu
    COMPUTER PHYSICS COMMUNICATIONS, 2017, 211 : 98 - 112
  • [40] Web Service for Data Extraction from Semi-structured Data Sources
    Yashina, Marina V.
    Nakonechnyy, Ivan I.
    PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON DEPENDABILITY AND COMPLEX SYSTEMS DEPCOS-RELCOMEX, 2014, 286 : 499 - 510