Strabo 2: Distributed Management of Massive Geospatial RDF Datasets

被引:1
|
作者
Bilidas, Dimitris [1 ]
Ioannidis, Theofilos [1 ]
Mamoulis, Nikos [2 ]
Koubarakis, Manolis [1 ]
机构
[1] Natl & Kapodistrian Univ Athens, Athens, Greece
[2] Univ Ioannina, Ioannina, Greece
来源
SEMANTIC WEB - ISWC 2022 | 2022年 / 13489卷
关键词
SPARQL;
D O I
10.1007/978-3-031-19433-7_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present STRABO 2, a distributed geospatial RDF store able to process GeoSPARQL queries over massive RDF datasets. STRABO 2 is based on robust technologies, able to scale on TBs of data distributed on hundreds of nodes. Specifically, we use the Spark framework, enhanced with the geospatial library SEDONA, for distributed in-memory processing on Hadoop clusters, and Hive for compact persistent storage of RDF data. STRABO 2 employs a flexible design that can store and partition thematic RDF data using different relational schemas, and spatial data in a separate Hive table, by taking into consideration the GeoSPARQL vocabulary. STRABO 2 is cluster friendly both memory and disk-wise, since it compresses triples using a partial encoding technique in addition to Parquet data file format compression schemes. GeoSPARQL queries are translated into the Spark SQL dialect, enhanced with the spatial functions and predicates offered by SEDONA. During this process the system takes into consideration SEDONA's capabilities for both spatial selections and spatial joins, in order to apply optimizations that result in efficient query processing. We experimentally test STRABO 2 on an award winning Hadoop based cluster environment and exhibit STRABO 2's excellent scalability while handling massive synthetic and real world datasets. We also show that STRABO 2 clearly outperforms state of the art centralized engines in a single server setup, once the dataset size increases beyond few GBs.
引用
收藏
页码:411 / 427
页数:17
相关论文
共 50 条
  • [1] Evaluating SPARQL Queries on Massive RDF Datasets
    Harbi, Razen
    Abdelaziz, Ibrahim
    Kalnis, Panos
    Mamoulis, Nikos
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12): : 1848 - 1851
  • [2] Provenance Management for Evolving RDF Datasets
    Avgoustaki, Argyro
    Flouris, Giorgos
    Fundulaki, Irini
    Plexousakis, Dimitris
    [J]. SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, 2016, 9678 : 575 - 592
  • [3] A survey of RDF management technologies and benchmark datasets
    Zhengyu Pan
    Tao Zhu
    Hong Liu
    Huansheng Ning
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2018, 9 : 1693 - 1704
  • [4] A survey of RDF management technologies and benchmark datasets
    Pan, Zhengyu
    Zhu, Tao
    Liu, Hong
    Ning, Huansheng
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2018, 9 (05) : 1693 - 1704
  • [5] Distributed Subtrajectory Join on Massive Datasets
    Tampakis, Panagiotis
    Doulkeridis, Christos
    Pelekis, Nikos
    Theodoridis, Yannis
    [J]. ACM TRANSACTIONS ON SPATIAL ALGORITHMS AND SYSTEMS, 2020, 6 (02)
  • [6] High-performance, Distributed Dictionary Encoding of RDF Datasets
    Morari, Alessandro
    Weaver, Jesse
    Villa, Oreste
    Haglin, David
    Tumeo, Antonino
    Castellana, Vito Giovanni
    Feo, John
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 250 - 253
  • [7] QPPDs: Querying Property Paths Over Distributed RDF Datasets
    Mehmood, Qaiser
    Saleem, Muhammad
    Sahay, Ratnesh
    Ngomo, Axel-Cyrille Ngonga
    D'Aquin, Mathieu
    [J]. IEEE ACCESS, 2019, 7 : 101031 - 101045
  • [8] StriderR: Massive and Distributed RDF Graph Stream Reasoning
    Ren, Xiangnan
    Cure, Olivier
    Naacke, Hubert
    Lhez, Jeremy
    Li, Ke
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3358 - 3367
  • [9] Big Data Geospatial Processing for Massive Aerial LiDAR Datasets
    Deibe, David
    Amor, Margarita
    Doallo, Ramon
    [J]. REMOTE SENSING, 2020, 12 (04)
  • [10] Change management and validation for collaborative editing of RDF datasets
    Fiorelli M.
    Pazienza M.T.
    Stellato A.
    Turbati A.
    [J]. International Journal of Metadata, Semantics and Ontologies, 2017, 12 (2-3) : 142 - 154