A MapReduce-based Approach to Scale Big Semantic Data Compression with HDT

被引:3
|
作者
Gimenez, J. M. [1 ]
Fernandez, J. D. [2 ]
Martinez, M. A. [3 ]
机构
[1] Univ Lyon, UJM St Etienne, CNRS, Lab Hubert Curien,UMR 5516, St Etienne, France
[2] Vienna Univ Econ & Business, Vienna, Austria
[3] Univ Valladolid, Dept Informat, DataWeb Res, Segovia, Spain
基金
欧盟地平线“2020”;
关键词
Compression; HDT; MapReduce; RDF; Semantic Web; Web of Data;
D O I
10.1109/TLA.2017.7959346
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data generation and publication on the Web has increased over the last years. This phenomenon, usually known as "Big Data", poses new challenges related with Volume, Velocity, and Variety ("The three V's") of data. The Semantic Web offers the means to deal with variety, where RDF (Resource Description Framework) is used to model data in the form of triples subject-predicate-object. In this way, it is possible to represent and interconnect RDF triples to build a true Web of Data. Nonetheless, a problem arises when big RDF collections must be stored, exchanges, and/or queried because the existing serialization formats are highly verbose, hence the remaining Big Semantic Data challenges (volume and variety) are aggravated when storing, exchanging, or querying big RDG collections. HDT addresses this issue by proposing a binary serialization format based on compact data structures that allows RDF to be compressed, but also to be queried without prior decompression. Thus, HDT reduces data volume and increases retrieval velocity. However, this achievement comes at the cost of and expensive RDF-to-HDT serialization in terms of computational resources and time. Therefore, HDT alleviates velocity and volume challenges for the end user, but moves Big Data challenges to the data publisher. In this work we show HDT-MR, a MapReduce-based algorithm that allows RDF datasets to be serialized to HDT in a distributed way, reducing processing resources and time, but also enabling larger datasets to be compressed.
引用
收藏
页码:1270 / 1277
页数:8
相关论文
共 50 条
  • [1] A MapReduce-based approach to social network big data mining
    Qi, Fuli
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2535 - 2547
  • [2] A MapReduce-Based ELM for Regression in Big Data
    Wu, B.
    Yan, T. H.
    Xu, X. S.
    He, B.
    Li, W. H.
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
  • [3] Atrak: a MapReduce-based data warehouse for big data
    Barkhordari, Mohammadhossein
    Niamanesh, Mahdi
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
  • [4] Atrak: a MapReduce-based data warehouse for big data
    Mohammadhossein Barkhordari
    Mahdi Niamanesh
    [J]. The Journal of Supercomputing, 2017, 73 : 4596 - 4610
  • [5] A MapReduce-based Fuzzy Associative Classifier for Big Data
    Ducange, Pietro
    Marcelloni, Francesco
    Segatori, Armando
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [6] Verifying Properties of MapReduce-Based Big Data Processing
    Zhang, Nan
    Wang, Meng
    Duan, Zhenhua
    Tian, Cong
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338
  • [7] MapReduce-based storage and indexing for big health data
    Gayathiri, N. R.
    Natarajan, A. M.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
  • [8] A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification
    Maillo, Jesus
    Triguero, Isaac
    Herrera, Francisco
    [J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 167 - 172
  • [9] An Accelerated MapReduce-Based K-prototypes for Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    [J]. SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016), 2016, 9946 : 13 - 25
  • [10] A MapReduce-based scalable discovery and indexing of structured big data
    Singh, Hari
    Bawa, Seema
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 73 : 32 - 43