Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark

被引:9
|
作者
Hassan, Mahmudul [1 ]
Bansal, Srividya K. [1 ]
机构
[1] Arizona State Univ, Decis Syst Engn, Informat, Sch Comp, Tempe, AZ 85281 USA
关键词
Resource Description Framework; Semantic Web; SPARQL Querying; Data Partitioning; Spark; ENGINE;
D O I
10.1109/ICOSC.2019.8665614
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid growth of semantic data in the form of Resource Description Framework (RDF) triples demands an efficient, scalable, and distributed storage and parallel processing strategies along with high availability and fault tolerance for its management and reuse. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second, solutions are optimized for certain types of query patterns and don't necessarily work well for all types of query patterns, and the third is concerned with reducing pre-processing and data loading times. To address these issues, we propose a relational partitioning scheme called Subset Property Table (SPT) for RDF data that further partitions the existing Property Table approach into subsets of tables to minimize query input and join operation. We combine SPT with another existing model Vertical Partitioning (VP) for storing RDF datasets and demonstrate that our proposed combined (SPT + VP) approach outperforms state-of-the-art systems based on inmemory processing engine in a distributed environment.
引用
收藏
页码:24 / 31
页数:8
相关论文
共 50 条
  • [1] Linked Data Partitioning for RDF Processing on Apache Spark
    Atashkar, Amir Hossein
    Ghadiri, Nasser
    Joodaki, Mehdi
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2017, : 73 - 77
  • [2] Distributed RDF Archives Querying with Spark
    Bahri, Afef
    Laajimi, Meriem
    Ayadi, Nadia Yacoubi
    [J]. SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 451 - 465
  • [3] Efficient RDF Knowledge Graph Partitioning Using Querying Workload
    Akhter, Adnan
    Saleem, Muhammad
    Bigerl, Alexander
    Ngomo, Axel-Cyrille Ngonga
    [J]. PROCEEDINGS OF THE 11TH KNOWLEDGE CAPTURE CONFERENCE (K-CAP '21), 2021, : 169 - 176
  • [4] Incremental Data Partitioning of RDF Data in SPARK
    Agathangelos, Giannis
    Troullinou, Georgia
    Kondylakis, Haridimos
    Stefanidis, Kostas
    Plexousakis, Dimitris
    [J]. SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 50 - 54
  • [5] RDF packages: a scheme for efficient reasoning and querying over large-scale RDF data
    Ohsawa, Shohei
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2012, 8 (02) : 212 - +
  • [6] Semantic Querying Big and Distributed RDF Data
    Kaoutar, Lamrani
    Abderrahim, Ghadi
    Kudagba, Florent Kunale
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS (SCA'18), 2018,
  • [7] S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data
    Hassan, Mahmudul
    Bansal, Srividya K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SMART DATA SERVICES (SMDS 2020), 2020, : 133 - 140
  • [8] Querying distributed RDF data sources with SPARQL
    Quilitz, Bastian
    Leser, Ulf
    [J]. SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2008, 5021 : 524 - 538
  • [9] Query Answering On Uncertain Big RDF Data Using Apache Spark Framework
    Benbernou, Salima
    Ouziri, Mourad
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4854 - 4860
  • [10] A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark
    Hosseini, Behrooz
    Kiani, Kourosh
    [J]. SYMMETRY-BASEL, 2018, 10 (08):