DBpedia and the live extraction of structured data from Wikipedia

被引:54
|
作者
Morsey, Mohamed [1 ]
Lehmann, Jens
Auer, Soeren [1 ]
Stadler, Claus
Hellmann, Sebastian
机构
[1] Univ Leipzig, Dept Comp Sci, Res Grp, Leipzig, Germany
关键词
Knowledge extraction; RDF; Wikipedia; Triplestore; Knowledge management; Data management; Databases; Websites;
D O I
10.1108/00330331211221828
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia-Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues. Design/methodology/approach - Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia-Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia-Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors. Findings - During the realization of DBpedia-Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently-updated Wikipedia articles should have the highest priority, over mapping-changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians. Practical implications - DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia-Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up-to-date information. Originality/value - The new DBpedia-Live framework adds new features to the old DBpedia-Live framework, e.g. abstract extraction, ontology changes, and changesets publication.
引用
收藏
页码:157 / 181
页数:25
相关论文
共 50 条
  • [41] Extraction of Linked Data Triples from Japanese Wikipedia Text of Ukiyo-e Painters
    Kimura, Fuminori
    Mitsui, Katsuhiro
    Maeda, Akira
    2013 INTERNATIONAL CONFERENCE ON CULTURE AND COMPUTING (CULTURE AND COMPUTING 2013), 2013, : 192 - +
  • [42] Structured knowledge creation for Urdu language: A DBpedia approach
    Rasham, Shanza
    Khan, Habib Ullah
    Maqbool, Fahad
    Razzaq, Saad
    Anwar, Sajid
    Ilyas, Muhammad
    EXPERT SYSTEMS, 2025, 42 (01)
  • [43] WC3: Analyzing the Style of Metadata Annotation Among Wikipedia Articles by Using Wikipedia Category and the DBpedia Metadata Database
    Yoshioka, Masaharu
    KNOWLEDGE GRAPHS AND LANGUAGE TECHNOLOGY, 2017, 10579 : 119 - 136
  • [44] Weakly Supervised Multilingual Causality Extraction from Wikipedia
    Hashimoto, Chikara
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2988 - 2999
  • [45] Information Extraction from Wikipedia Using Pattern Learning
    Mihaltz, Marton
    ACTA CYBERNETICA, 2010, 19 (04): : 677 - 694
  • [46] Family Matters: Company Relations Extraction from Wikipedia
    Kuznetsov, Artem
    Braslavski, Pavel
    Ivanov, Vladimir
    KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2016, 2016, 649 : 81 - 92
  • [47] A generic method for multi word extraction from Wikipedia
    Bekavac, Bozo
    Tadic, Marko
    PROCEEDINGS OF THE ITI 2008 30TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2008, : 663 - 667
  • [48] Semantic resource extraction from Wikipedia category lattice
    Collin, Olivier
    Gaillard, Benoit
    Bouraoui, Jean-Leon
    Girault, Thomas
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : E23 - E29
  • [49] Automatic Extraction of Axioms from Wikipedia Using SPARQL
    Haidar-Ahmad, Lara
    Zouaq, Amal
    Gagnon, Michel
    SEMANTIC WEB, ESWC 2016, 2016, 9989 : 60 - 64
  • [50] Relation Extraction from Wikipedia Leveraging Intrinsic Patterns
    Gu, Yulong
    Liu, Weidong
    Song, Jiaxing
    2015 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT), VOL 1, 2015, : 181 - 186