A MapReduce-based Approach to Scale Big Semantic Data Compression with HDT

被引：3

作者：

Gimenez, J. M. ^{[1
]}

Fernandez, J. D. ^{[2
]}

Martinez, M. A. ^{[3
]}

机构：

[1] Univ Lyon, UJM St Etienne, CNRS, Lab Hubert Curien,UMR 5516, St Etienne, France

[2] Vienna Univ Econ & Business, Vienna, Austria

[3] Univ Valladolid, Dept Informat, DataWeb Res, Segovia, Spain

来源：

IEEE LATIN AMERICA TRANSACTIONS | 2017年 / 15卷 / 07期

基金：

欧盟地平线“2020”;

关键词：

Compression; HDT; MapReduce; RDF; Semantic Web; Web of Data;

D O I：

10.1109/TLA.2017.7959346

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Data generation and publication on the Web has increased over the last years. This phenomenon, usually known as "Big Data", poses new challenges related with Volume, Velocity, and Variety ("The three V's") of data. The Semantic Web offers the means to deal with variety, where RDF (Resource Description Framework) is used to model data in the form of triples subject-predicate-object. In this way, it is possible to represent and interconnect RDF triples to build a true Web of Data. Nonetheless, a problem arises when big RDF collections must be stored, exchanges, and/or queried because the existing serialization formats are highly verbose, hence the remaining Big Semantic Data challenges (volume and variety) are aggravated when storing, exchanging, or querying big RDG collections. HDT addresses this issue by proposing a binary serialization format based on compact data structures that allows RDF to be compressed, but also to be queried without prior decompression. Thus, HDT reduces data volume and increases retrieval velocity. However, this achievement comes at the cost of and expensive RDF-to-HDT serialization in terms of computational resources and time. Therefore, HDT alleviates velocity and volume challenges for the end user, but moves Big Data challenges to the data publisher. In this work we show HDT-MR, a MapReduce-based algorithm that allows RDF datasets to be serialized to HDT in a distributed way, reducing processing resources and time, but also enabling larger datasets to be compressed.

引用

页码：1270 / 1277

页数：8

共 50 条

[1] A MapReduce-based approach to social network big data mining
Qi, Fuli
[J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2535 - 2547
[2] A MapReduce-Based ELM for Regression in Big Data
Wu, B.
Yan, T. H.
Xu, X. S.
He, B.
Li, W. H.
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
[3] Atrak: a MapReduce-based data warehouse for big data
Barkhordari, Mohammadhossein
Niamanesh, Mahdi
[J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
[4] Atrak: a MapReduce-based data warehouse for big data
Mohammadhossein Barkhordari
Mahdi Niamanesh
[J]. The Journal of Supercomputing, 2017, 73 : 4596 - 4610
[5] A MapReduce-based Fuzzy Associative Classifier for Big Data
Ducange, Pietro
Marcelloni, Francesco
Segatori, Armando
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
[6] Verifying Properties of MapReduce-Based Big Data Processing
Zhang, Nan
Wang, Meng
Duan, Zhenhua
Tian, Cong
[J]. IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338
[7] MapReduce-based storage and indexing for big health data
Gayathiri, N. R.
Natarajan, A. M.
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
[8] A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification
Maillo, Jesus
Triguero, Isaac
Herrera, Francisco
[J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 167 - 172
[9] An Accelerated MapReduce-Based K-prototypes for Big Data
Ben HajKacem, Mohamed Aymen
Ben N'cir, Chiheb-Eddine
Essoussi, Nadia
[J]. SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016), 2016, 9946 : 13 - 25
[10] A MapReduce-based scalable discovery and indexing of structured big data
Singh, Hari
Bawa, Seema
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 73 : 32 - 43

← 1 2 3 4 5 →