Efficient Distributed SPARQL Queries on Apache Spark

被引:0
|
作者
Albahli, Saleh [1 ]
机构
[1] Qassim Univ, Coll Comp, Buraydah, Saudi Arabia
关键词
Semantic web; RDF; SPARQL; SPARK; GraphX; triple patterns;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
RDF is a widely-accepted framework for describing metadata in the web due to its simplicity and universal graph-like data model. Owing to the abundance of RDF data, existing query techniques are rendered unsuitable. To this direction, we adopt the processing power of Apache Spark to load and query a large dataset much more quickly than classical approaches. In this paper, we have designed experiments to evaluate the performance of several queries ranging from single attribute selection to selection, filtering and sorting multiple attributes in the dataset. We further experimented with the performance of queries using distributed SPARQL query on Apache Spark GraphX and studied different stages involved in this pipeline. The execution of distributed SPARQL query on Apache Spark GraphX helped us study its performance and gave insights into which stages of the pipeline can be improved. The query pipeline comprised of Graph loading, Basic Graph Pattern and Result calculating. Our goal is to minimize the time during graph loading stage in order to improve overall performance and cut the costs of data loading.
引用
收藏
页码:564 / 568
页数:5
相关论文
共 50 条
  • [1] Efficient distributed SPARQL queries on Apache Spark
    Albahli, Saleh
    [J]. International Journal of Advanced Computer Science and Applications, 2019, 10 (08): : 564 - 568
  • [2] SPARQLGX: Efficient Distributed Evaluation of SPARQL with Apache Spark
    Graux, Damien
    Jachiet, Louis
    Geneves, Pierre
    Layaida, Nabil
    [J]. SEMANTIC WEB - ISWC 2016, PT II, 2016, 9982 : 80 - 87
  • [3] Towards Efficient Distributed SPARQL Queries on Linked Data
    Li, Xuejin
    Niu, Zhendong
    Zhang, Chunxia
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 259 - 272
  • [4] SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink
    Ceballos, Oscar
    Ramirez Restrepo, Carlos Alberto
    Constanza Pabon, Maria
    Castillo, Andres M.
    Corcho, Oscar
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (15):
  • [5] Efficient Distributed Range Query Processing in Apache Spark
    Papadopoulos, Apostolos N.
    Sioutas, Spyros
    Zacharatos, Nikolaos
    Zaroliagis, Christos
    [J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 569 - 575
  • [6] Execution of Recursive Queries in Apache Spark
    Katsogridakis, Pavlos
    Papagiannaki, Sofia
    Pratikakis, Polyvios
    [J]. EURO-PAR 2017: PARALLEL PROCESSING, 2017, 10417 : 289 - 302
  • [7] Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark
    Xu, Bo
    Li, Changlong
    Zhuang, Hang
    Wang, Jiali
    Wang, Qingfeng
    Zhou, Xuehai
    [J]. 2017 IEEE 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2017, : 608 - 615
  • [8] S3QLRDF: distributed SPARQL query processing using Apache Spark—a comparative performance study
    Mahmudul Hassan
    Srividya Bansal
    [J]. Distributed and Parallel Databases, 2023, 41 : 191 - 231
  • [9] Efficient Processing of SPARQL Queries Over GraphFrames
    Bahrami, Ramazan Ali
    Gulati, Jayati
    Abulaish, Muhammad
    [J]. 2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 678 - 685
  • [10] Processing SPARQL queries over distributed RDF graphs
    Peng Peng
    Lei Zou
    M. Tamer Özsu
    Lei Chen
    Dongyan Zhao
    [J]. The VLDB Journal, 2016, 25 : 243 - 268