Performance Evaluation of Spatial Data Management Systems Using GeoSpark

被引:3
|
作者
Shin, Hansub [1 ]
Lee, Kisung [2 ]
Kwon, Hyuk-Yoon [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Ind & Syst Engn, Seoul, South Korea
[2] Louisiana State Univ, Div Comp Sci & Engn, Baton Rouge, LA 70803 USA
基金
新加坡国家研究基金会;
关键词
Large-scale spatial data; GeoSpark; Performance evaluation; Distributed environments; BIG DATA;
D O I
10.1109/BigComp48618.2020.00-75
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we evaluate the performance of spatial data management systems in distributed computing environments. Given that GeoSpark outperforms other spatial systems in many scenarios as reported in several studies, we choose spatial data management systems using GeoSpark for this evaluation. Even though GeoSpark supports various storage engines as its underlying data store, the effects of the storage engines for spatial data processing have not been well studied. To address this limitation, we evaluate the performance of GeoSpark using two underlying data stores: 1) HDFS and 2) MongoDB. We first design and build distributed experimental environments based on Amazon EC2 and EMR using up to 10 nodes. Through the extensive experiments on three synthetic and real data sets, we show that the overall performance of both HDFS- and MongoDB-based GeoSpark improves as we increase the number of nodes. We also show that HDFS-based GeoSpark generally outperforms MongoDB-based GeoSpark, especially for large-scale data sets. In addition, we demonstrate that the proper use of caching on HDFS-based GeoSpark can improve the overall query processing performance by up to three orders of magnitude.
引用
收藏
页码:197 / 200
页数:4
相关论文
共 50 条
  • [1] Spatial data management in apache spark: the GeoSpark perspective and beyond
    Jia Yu
    Zongsi Zhang
    Mohamed Sarwat
    GeoInformatica, 2019, 23 : 37 - 78
  • [2] Spatial data management in apache spark: the GeoSpark perspective and beyond
    Yu, Jia
    Zhang, Zongsi
    Sarwat, Mohamed
    GEOINFORMATICA, 2019, 23 (01) : 37 - 78
  • [3] Spatial Data Management in IoT Systems: Solutions and Evaluation
    Krommyda, Maria
    Kantere, Verena
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2021, 15 (01) : 117 - 139
  • [4] A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark
    Hansub Shin
    Kisung Lee
    Hyuk-Yoon Kwon
    The Journal of Supercomputing, 2022, 78 : 2556 - 2579
  • [5] A comparative experimental study of distributed storage engines for big spatial data processing using GeoSpark
    Shin, Hansub
    Lee, Kisung
    Kwon, Hyuk-Yoon
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (02): : 2556 - 2579
  • [6] A Demonstration of GeoSpark: A Cluster Computing Framework for Processing Big Spatial Data
    Yu, Jia
    Wu, Jinxuan
    Sarwat, Mohamed
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1410 - 1413
  • [7] GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data
    Yu, Jia
    Wu, Jinxuan
    Sarwat, Mohamed
    23RD ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2015), 2015,
  • [8] Environmental impact evaluation of spatial management practices using simulations with spatial data
    Goderya, F.S.
    Dahab, M.F.
    Woldt, W.E.
    Bogardi, I.
    Journal of Water Resources Planning and Management, 124 (04): : 181 - 191
  • [9] Environmental impact evaluation of spatial management practices using simulations with spatial data
    Goderya, FS
    Dahab, MF
    Woldt, WE
    Bogardi, I
    JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT-ASCE, 1998, 124 (04): : 181 - 191
  • [10] Performance Evaluation of Healthcare Systems Using Data Envelopment Analysis
    Viridiana Gonzalez-Badillo, Itzel
    Estefania Alarcon-Bernal, Zaida
    COMPUTER SCIENCE AND ENGINEERING IN HEALTH SERVICES, 2021, 393 : 162 - 173