Query Optimization for massive RDF data based on Spark

被引：2

作者：

Li, Shaohui ^{[1
]}

Shen, Derong ^{[1
]}

Kou, Yue ^{[1
]}

Yang, Dan ^{[1
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

来源：

2018 4TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2018) | 2018年

基金：

中国国家自然科学基金;

关键词：

Spark; RDF; Distributed cluster; Sparql;

D O I：

10.1109/BIGCOM.2018.00042

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sparql (SPARQL Protocol and RDF Query Language) is a query language and data acquisition protocol designed for RDF development. Although it is defined for the RDF data model developed by the W3C, it can be used in any form of RDF to represent data resources. With the explosive growth of web information resources, more and more data is using RDF structure. The research and obtaining of useful information in massive data has become a major challenge. Efficient search and effective query has become the focus attention of research. In this paper, we design an efficient optimization method by finding a semantic connection chain in the system (Sparkllink) Data was stored on the file system of hadoop (HDFS). Based on Spark framework with efficient distributed memory, this system has achieved efficient searching and optimizing performance for massive RDF data. Our work includes the following mechanism: (1) using vertical partition as data storage structure; (2) using twice data statistics; (3) using information connection chain based on semantic. Our system can support massive triples query in distributed environment to achieve efficient query processing. The experiment of this paper is based on the latest SPARQLGX on the spark platform RDF system. In contrast, our system is more efficient in data search than SPARQLGX.

引用

页码：219 / 224

页数：6

共 50 条

[1] Massive RDF Data Complicated Query Optimization Based on MapReduce
Cheng, Jieru
Wang, Wenjun
Gao, Rui
[J]. INTERNATIONAL CONFERENCE ON SOLID STATE DEVICES AND MATERIALS SCIENCE, 2012, 25 : 1414 - 1419
[2] Massive RDF Data Complicated Query Optimization Based on MapReduce
Cheng, Jieru
Wang, Wenjun
Gao, Rui
[J]. 2010 INTERNATIONAL CONFERENCE ON COMMUNICATION AND VEHICULAR TECHNOLOGY (ICCVT 2010), VOL I, 2010, : 182 - 185
[3] Semantic connection set-based massive RDF data query processing in Spark environment
Jiuyun Xu
Chao Zhang
[J]. EURASIP Journal on Wireless Communications and Networking, 2019
[4] Semantic connection set-based massive RDF data query processing in Spark environment
Xu, Jiuyun
Zhang, Chao
[J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2019, 2019 (01)
[5] A Distributed Query Method for RDF Data on Spark
Guo, Minru
Wang, Jingbin
[J]. BIG DATA TECHNOLOGY AND APPLICATIONS, 2016, 590 : 102 - 115
[6] Query Optimization of Distributed RDF Data Based on MapReduce
Zhang, Yanqin
Wang, Jingbin
[J]. MACHINERY ELECTRONICS AND CONTROL ENGINEERING III, 2014, 441 : 970 - 973
[7] Optimization Algorithm of Massive Data Query Based on JOIN
Zheng Jiajia
Sun Jiasong
[J]. 2014 5TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2014, : 933 - 936
[8] Query Optimization of Massive Social Network Data Based on HBase
Bao, Congkai
Cao, Meiyang
[J]. 2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 94 - 97
[9] Query Answering On Uncertain Big RDF Data Using Apache Spark Framework
Benbernou, Salima
Ouziri, Mourad
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4854 - 4860
[10] Optimization of RDF link traversal based query execution
[J]. Zhu, Y. (yqzhu@suda.edu.cn), 1600, Southeast University (29):

← 1 2 3 4 5 →