Linked Data Partitioning for RDF Processing on Apache Spark

被引:0
|
作者
Atashkar, Amir Hossein [1 ]
Ghadiri, Nasser [1 ]
Joodaki, Mehdi [1 ]
机构
[1] Isfahan Univ Technol, Dept Elect & Comp Engn, Esfahan, Iran
关键词
Linked data; scalable algorithms; NoSQL; big data;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RDF models are widely used in the web of data due to their flexibility and similarity to graph patterns. Because of the growing use of RDFs, their volumes and contents are increasing. Therefore, processing of such massive amount of data on a single machine is not efficient enough, because of the response time and limited hardware resources. A common approach to overcome this limitation is cluster processing and huge datasets could benefit distributed cluster processing on Apache Hadoop. Because of using too much of hard disks, the processing time is usually inadequate. In this paper, we propose a partitiong approach based on Apache Spark for rapid processing of RDF data models. A key feature of Apache Spark is using main memory instead of hard disk, so the speed of data processing in our method is improved. We have evaluated the proposed method by runing SQL queris on RDF data which partitioned on the cluster and demonstrates improved performance.
引用
收藏
页码:73 / 77
页数:5
相关论文
共 50 条
  • [1] Data Partitioning Scheme for Efficient Distributed RDF Querying Using Apache Spark
    Hassan, Mahmudul
    Bansal, Srividya K.
    [J]. 2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 24 - 31
  • [2] Incremental Data Partitioning of RDF Data in SPARK
    Agathangelos, Giannis
    Troullinou, Georgia
    Kondylakis, Haridimos
    Stefanidis, Kostas
    Plexousakis, Dimitris
    [J]. SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 50 - 54
  • [3] Efficiently Processing and Storing Library Linked Data using Apache Spark and Parquet
    Sharma, Kumar
    Marjit, Ujjal
    Biswas, Utpal
    [J]. INFORMATION TECHNOLOGY AND LIBRARIES, 2018, 37 (03) : 29 - 49
  • [4] Static and Dynamic Big Data Partitioning on Apache Spark
    Bertolucci, Massimiliano
    Carlini, Emanuele
    Dazzi, Patrizio
    Lulli, Alessandro
    Ricci, Laura
    [J]. PARALLEL COMPUTING: ON THE ROAD TO EXASCALE, 2016, 27 : 489 - 498
  • [5] Big Spatial Data Processing With Apache Spark
    Boyi Shangguan
    Peng Yue
    Wu, Zhaoyan
    Jiang, Liangcun
    [J]. 2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
  • [6] Apache Spark: A Big Data Processing Engine
    Shaikh, Eman
    Mohiuddin, Iman
    Alufaisan, Yasmeen
    Nahvi, Irum
    [J]. 2019 2ND IEEE MIDDLE EAST AND NORTH AFRICA COMMUNICATIONS CONFERENCE (IEEEMENACOMM'19), 2019, : 220 - 225
  • [7] Query Answering On Uncertain Big RDF Data Using Apache Spark Framework
    Benbernou, Salima
    Ouziri, Mourad
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4854 - 4860
  • [8] Statement Hypergraph as Partitioning Model for RDF Data Processing
    Yuan, Pingpeng
    Zhang, Wenya
    Jin, Hai
    Wu, Buwen
    [J]. 2012 IEEE ASIA-PACIFIC SERVICES COMPUTING CONFERENCE (APSCC), 2012, : 138 - 145
  • [9] Pre-processing of RDF data for METIS partitioning
    Benhamed, Siham
    Nait-Bahloul, Safia
    [J]. International Journal of Metadata, Semantics and Ontologies, 2023, 16 (02) : 152 - 171
  • [10] Identifying the potential of Near Data Processing for Apache Spark
    Awan, Ahsan Javed
    Ohara, Moriyoshi
    Ayguade, Eduard
    Ishizaki, Kazuaki
    Brorsson, Mats
    Vlassov, Vladimir
    [J]. MEMSYS 2017: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2017, : 60 - 67