A Survey of Distributed RDF Data Management

被引:0
|
作者
Zou L. [1 ]
Peng P. [2 ]
机构
[1] Institute of Computer Science & Technology, Peking University, Beijing
[2] College of Computer Science and Electronic Engineering, Hunan University, Changsha
来源
| 2017年 / Science Press卷 / 54期
关键词
Cloud computing; Distributed database system; Linked data; RDF data management; SPARQL query processing;
D O I
10.7544/issn1000-1239.2017.20160908
中图分类号
学科分类号
摘要
Recently, RDF (resource description framework) has been widely used to expose, share, and connect pieces of data on the Web, while SPARQL (simple protocol and RDF query language) is a structured query language to access RDF repository. As RDF datasets increase in size, evaluating SPARQL queries over current RDF repositories is beyond the capacity of a single machine. As a result, a high performance distributed RDF database system is needed to efficiently evaluate SPARQL queries. There are a huge number of works for distributed RDF data management following different approaches. In this paper we provide an overview of these works. This survey considers three kinds of distributed data management approaches, including cloud-based distributed data management approaches, partitioning-based distributed data management approaches and federated RDF systems. Simply speaking, cloud-based distributed data management approaches use existing cloud platforms to manage large RDF datasets; partitioning-based distributed data management approaches divide an RDF graph into several fragments and place each fragment at a different site in a distributed system; and federated RDF systems disallow for re-partitioning the data, since the data has been distributed over their own autonomous sites. In each kind of distributed data management approaches, further discussions are also provided to help readers to understand the characteristics of different approaches. © 2017, Science Press. All right reserved.
引用
收藏
页码:1213 / 1224
页数:11
相关论文
共 43 条
  • [1] Bonstrom V., Hinze A., Schweppe H., Storing RDF as a graph, Proc of Latin American Web Congress, pp. 27-36, (2003)
  • [2] Zou L., Mo J., Chen L., Et al., An SPC-based forward-backward algorithm for arrhythmic beat detection and classification, Proceedings of the VLDB Endowment, 4, 8, pp. 482-493, (2011)
  • [3] Zou L., Ozsu M.T., Chen L., Et al., gStore: A graph-based SPARQL query engine, VLDB Journal, 23, 4, pp. 565-590, (2014)
  • [4] Lehmann J., Isele R., Jakob M., Et al., DBpedia-A large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web, 6, 2, pp. 167-195, (2015)
  • [5] Suchanek F.M., Kasneci G., Weikum G., YAGO:A large ontology from Wikipedia and WordNet, Journal of Web Semantics, 6, 3, pp. 203-217, (2008)
  • [6] Hoffart J., Suchanek F.M., Berberich K., Et al., YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia, Artificial Intelligence, 194, pp. 28-61, (2013)
  • [7] Mahdisoltani F., Biega J., Suchanek F.M., YAGO3:A knowledge base from multilingual Wikipedias, Proc of the 7th Biennial Conf on Innovative Data Systems Research, pp. 1-11, (2015)
  • [8] Perez J., Arenas M., Gutierrez C., Semantics and complexity of SPARQL, ACM Trans on Database Systems, 34, 3, pp. 16:1-16:45, (2009)
  • [9] Hadoop
  • [10] Shao B., Wang H., Li Y., Trinity: A distributed graph engine on a memory cloud, Proc of the 2013 ACM SIGMOD Int Conf on Management of Data, pp. 505-516, (2013)