Parallel computation of PDFs on big spatial data using Spark

被引:3
|
作者
Liu, Ji [1 ,2 ]
Lemus, Noel Moreno [3 ]
Pacitti, Esther [1 ,2 ]
Porto, Fabio [3 ]
Valduriez, Patrick [1 ,2 ]
机构
[1] INRIA, Montpellier, France
[2] Univ Montpellier, LIRMM, Montpellier, France
[3] LNCC Petropolis, Petropolis, RJ, Brazil
关键词
Spatial data; Big data; Parallel processing; Spark; SIMULATION;
D O I
10.1007/s10619-019-07260-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider big spatial data, which is typically produced in scientific areas such as geological or seismic interpretation. The spatial data can be produced by observation (e.g. using sensors or soil instruments) or numerical simulation programs and correspond to points that represent a 3D soil cube area. However, errors in signal processing and modeling create some uncertainty, and thus a lack of accuracy in identifying geological or seismic phenomenons. Such uncertainty must be carefully analyzed. To analyze uncertainty, the main solution is to compute a Probability Density Function (PDF) of each point in the spatial cube area. However, computing PDFs on big spatial data can be very time consuming (from several hours to even months on a computer cluster). In this paper, we propose a new solution to efficiently compute such PDFs in parallel using Spark, with three methods: data grouping, machine learning prediction and sampling. We evaluate our solution by extensive experiments on different computer clusters using big data ranging from hundreds of GB to several TB. The experimental results show that our solution scales up very well and can reduce the execution time by a factor of 33 (in the order of seconds or minutes) compared with a baseline method.
引用
收藏
页码:63 / 100
页数:38
相关论文
共 50 条
  • [1] Parallel computation of PDFs on big spatial data using Spark
    Ji Liu
    Noel Moreno Lemus
    Esther Pacitti
    Fabio Porto
    Patrick Valduriez
    [J]. Distributed and Parallel Databases, 2020, 38 : 63 - 100
  • [2] A Parallel DistributedWeka Framework for Big Data Mining using Spark
    Koliopoulos, Aris-Kyriakos
    Yiapanis, Paraskevas
    Tekiner, Firat
    Nenadic, Goran
    Keane, John
    [J]. 2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 9 - 16
  • [3] MiCS-P:Parallel mutual-information computation of big categorical data on spark
    Li, Junli
    Zhang, Chaowei
    Zhang, Jifu
    Qin, Xiao
    Hu, Lihua
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 161 : 118 - 129
  • [4] Big Spatial Data Processing With Apache Spark
    Boyi Shangguan
    Peng Yue
    Wu, Zhaoyan
    Jiang, Liangcun
    [J]. 2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 239 - 242
  • [5] Efficient Big Image Data Retrieval Using Clustering Index and Parallel Computation
    Su, Ja-Hwung
    Chin, Chu-Yu
    Li, Jyun-Yu
    Tseng, Vincent S.
    [J]. 2017 IEEE 8TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2017, : 182 - 187
  • [6] An efficient parallel indexing structure for multi-dimensional big data using spark
    Elmeiligy, Manar A.
    El Desouky, Ali I.
    Elghamrawy, Sally M.
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (10): : 11187 - 11214
  • [7] The Parallel Fuzzy C-Median Clustering Algorithm Using Spark for the Big Data
    Alam Mallik, Moksud
    Fariza Zulkurnain, Nurul
    Siddiqui, Sumrana
    Sarkar, Rashel
    [J]. IEEE Access, 2024, 12 : 151785 - 151804
  • [8] An efficient parallel indexing structure for multi-dimensional big data using spark
    Manar A. Elmeiligy
    Ali I. El Desouky
    Sally M. Elghamrawy
    [J]. The Journal of Supercomputing, 2021, 77 : 11187 - 11214
  • [9] Using Parallel Hierarchical Clustering to Address Spatial Big Data Challenges
    Woodley, Alan
    Tang, Ling-Xiang
    Geva, Shlomo
    Nayak, Richi
    Chappell, Timothy
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2692 - 2698
  • [10] Towards Parallel Spatial Query Processing for Big Spatial Data
    Zhong, Yunqin
    Han, Jizhong
    Zhang, Tieying
    Li, Zhenhua
    Fang, Jinyun
    Chen, Guihai
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2085 - 2094