Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML

被引:6
|
作者
Vahdati, Sahar [1 ]
Karim, Farah [1 ]
Huang, Jyun-Yao [2 ]
Lange, Christoph [1 ,3 ]
机构
[1] Univ Bonn, Bonn, Germany
[2] Natl Chung Hsing Univ, Taichung 40227, Taiwan
[3] Fraunhofer IAIS, St Augustin, Germany
关键词
D O I
10.1007/978-3-319-24129-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets). These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach. We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.
引用
收藏
页码:261 / 273
页数:13
相关论文
共 50 条
  • [1] Metadata as Linked Open Data: mapping disparate XML metadata registries into one RDF/OWL registry
    Villegas, Marta
    Melero, Maite
    Bel, Nuria
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [2] CSV compaction to improve data-processing performance for large XML documents
    Yoshida, S
    Yahagi, H
    Odagiri, J
    [J]. DCC 2004: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2004, : 574 - 574
  • [3] Hadoop-HBase for Large-Scale Data
    Vora, Mehul Nalin
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 601 - 605
  • [4] Learning Object Metadata Mapping for Linked Open Data
    Thangsupachai, Noppol
    Niwattanakul, Suphakit
    Chamnongsri, Nisachol
    [J]. EMERGENCE OF DIGITAL LIBRARIES - RESEARCH AND PRACTICES, 2014, 8839 : 122 - 129
  • [5] Metadata Exploitation in Large-scale Data Migration Projects
    Narayanan, Ram
    Oberhofer, Martin
    Pandit, Sushain
    [J]. AMCIS 2012 PROCEEDINGS, 2012,
  • [6] Metadata driven integration model for large scale data integration
    Barkallah, Bassem
    Moalla, Samir
    [J]. 2009 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2009, : 41 - 46
  • [7] Large Scale Data Storage and Processing of Insulator Leakage Current Using HBase and MapReduce
    Song, Yaqi
    Zhu, Yongli
    Li, Li
    [J]. 2014 INTERNATIONAL CONFERENCE ON POWER SYSTEM TECHNOLOGY (POWERCON), 2014, : 1331 - 1337
  • [8] A Fast Retrieval Algorithm for Large-Scale XML Data
    Tanioka, Hiroki
    [J]. FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 129 - 137
  • [9] Metadata in Research Data Australia and the Open Provenance Model: A Proposed Mapping
    Wu, Mingfang
    Treloar, Andrew
    [J]. 21ST INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION (MODSIM2015), 2015, : 641 - 647
  • [10] FDSSS: An efficient metadata management scheme in large scale data environment
    Xiong, Muzhou
    Jin, Hai
    Wu, Song
    [J]. GCC 2006: FIFTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING WORKSHOPS, PROCEEDINGS, 2006, : 71 - +