Query Optimization for massive RDF data based on Spark

被引:2
|
作者
Li, Shaohui [1 ]
Shen, Derong [1 ]
Kou, Yue [1 ]
Yang, Dan [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China
基金
中国国家自然科学基金;
关键词
Spark; RDF; Distributed cluster; Sparql;
D O I
10.1109/BIGCOM.2018.00042
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sparql (SPARQL Protocol and RDF Query Language) is a query language and data acquisition protocol designed for RDF development. Although it is defined for the RDF data model developed by the W3C, it can be used in any form of RDF to represent data resources. With the explosive growth of web information resources, more and more data is using RDF structure. The research and obtaining of useful information in massive data has become a major challenge. Efficient search and effective query has become the focus attention of research. In this paper, we design an efficient optimization method by finding a semantic connection chain in the system (Sparkllink) Data was stored on the file system of hadoop (HDFS). Based on Spark framework with efficient distributed memory, this system has achieved efficient searching and optimizing performance for massive RDF data. Our work includes the following mechanism: (1) using vertical partition as data storage structure; (2) using twice data statistics; (3) using information connection chain based on semantic. Our system can support massive triples query in distributed environment to achieve efficient query processing. The experiment of this paper is based on the latest SPARQLGX on the spark platform RDF system. In contrast, our system is more efficient in data search than SPARQLGX.
引用
收藏
页码:219 / 224
页数:6
相关论文
共 50 条
  • [1] Massive RDF Data Complicated Query Optimization Based on MapReduce
    Cheng, Jieru
    Wang, Wenjun
    Gao, Rui
    [J]. INTERNATIONAL CONFERENCE ON SOLID STATE DEVICES AND MATERIALS SCIENCE, 2012, 25 : 1414 - 1419
  • [2] Massive RDF Data Complicated Query Optimization Based on MapReduce
    Cheng, Jieru
    Wang, Wenjun
    Gao, Rui
    [J]. 2010 INTERNATIONAL CONFERENCE ON COMMUNICATION AND VEHICULAR TECHNOLOGY (ICCVT 2010), VOL I, 2010, : 182 - 185
  • [3] Semantic connection set-based massive RDF data query processing in Spark environment
    Jiuyun Xu
    Chao Zhang
    [J]. EURASIP Journal on Wireless Communications and Networking, 2019
  • [4] Semantic connection set-based massive RDF data query processing in Spark environment
    Xu, Jiuyun
    Zhang, Chao
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2019, 2019 (01)
  • [5] A Distributed Query Method for RDF Data on Spark
    Guo, Minru
    Wang, Jingbin
    [J]. BIG DATA TECHNOLOGY AND APPLICATIONS, 2016, 590 : 102 - 115
  • [6] Query Optimization of Distributed RDF Data Based on MapReduce
    Zhang, Yanqin
    Wang, Jingbin
    [J]. MACHINERY ELECTRONICS AND CONTROL ENGINEERING III, 2014, 441 : 970 - 973
  • [7] Optimization Algorithm of Massive Data Query Based on JOIN
    Zheng Jiajia
    Sun Jiasong
    [J]. 2014 5TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2014, : 933 - 936
  • [8] Query Optimization of Massive Social Network Data Based on HBase
    Bao, Congkai
    Cao, Meiyang
    [J]. 2019 4TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2019), 2019, : 94 - 97
  • [9] Query Answering On Uncertain Big RDF Data Using Apache Spark Framework
    Benbernou, Salima
    Ouziri, Mourad
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4854 - 4860
  • [10] Optimization of RDF link traversal based query execution
    [J]. Zhu, Y. (yqzhu@suda.edu.cn), 1600, Southeast University (29):