Performance Evaluation of Spatial Data Management Systems Using GeoSpark

被引:3
|
作者
Shin, Hansub [1 ]
Lee, Kisung [2 ]
Kwon, Hyuk-Yoon [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Ind & Syst Engn, Seoul, South Korea
[2] Louisiana State Univ, Div Comp Sci & Engn, Baton Rouge, LA 70803 USA
基金
新加坡国家研究基金会;
关键词
Large-scale spatial data; GeoSpark; Performance evaluation; Distributed environments; BIG DATA;
D O I
10.1109/BigComp48618.2020.00-75
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we evaluate the performance of spatial data management systems in distributed computing environments. Given that GeoSpark outperforms other spatial systems in many scenarios as reported in several studies, we choose spatial data management systems using GeoSpark for this evaluation. Even though GeoSpark supports various storage engines as its underlying data store, the effects of the storage engines for spatial data processing have not been well studied. To address this limitation, we evaluate the performance of GeoSpark using two underlying data stores: 1) HDFS and 2) MongoDB. We first design and build distributed experimental environments based on Amazon EC2 and EMR using up to 10 nodes. Through the extensive experiments on three synthetic and real data sets, we show that the overall performance of both HDFS- and MongoDB-based GeoSpark improves as we increase the number of nodes. We also show that HDFS-based GeoSpark generally outperforms MongoDB-based GeoSpark, especially for large-scale data sets. In addition, we demonstrate that the proper use of caching on HDFS-based GeoSpark can improve the overall query processing performance by up to three orders of magnitude.
引用
收藏
页码:197 / 200
页数:4
相关论文
共 50 条
  • [21] Performance Analysis of Spatial Data Broadcast for Navigation Systems
    Lo, Shou-Chih
    2006 IEEE 63RD VEHICULAR TECHNOLOGY CONFERENCE, VOLS 1-6, 2006, : 861 - 865
  • [22] A Performance Evaluation of Hive for Scientific Data Management
    Liu, Taoying
    Liu, Jing
    Liu, Hong
    Li, Wei
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [23] EVALUATION OF REGIONAL INNOVATION SYSTEMS PERFORMANCE USING DATA ENVELOPMENT ANALYSIS (DEA)
    Vechkinzova, Elena
    Petrenko, Yelena
    Bencic, Stanislav
    Ulybyshev, Dmitriy
    Zhailauov, Yerlan
    ENTREPRENEURSHIP AND SUSTAINABILITY ISSUES, 2019, 7 (01): : 498 - 509
  • [24] Performance Evaluation of HDFS in Big Data Management
    Dev, Dipayan
    Patgiri, Ripon
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND APPLICATIONS (ICHPCA), 2014,
  • [25] Performance evaluation of Data Distribution Management strategies
    Boukerche, A
    Dzermajko, C
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2004, 16 (15): : 1545 - 1573
  • [26] A data warehouse for performance management of distributed systems
    Sriram, C
    Martin, P
    Powley, W
    PROCEEDINGS OF THE IEEE THIRD INTERNATIONAL WORKSHOP ON SYSTEMS MANAGEMENT, 1998, : 68 - 77
  • [27] Using Linked Data for Systems Management
    Feridun, Metin
    Tanner, Axel
    PROCEEDINGS OF THE 2010 IEEE-IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2010, : 926 - 929
  • [28] Evaluation of Performance Management Systems for Knowledge Workers
    Shott, Tom
    Imondi, Chris
    Fedie, Ryan
    PICMET '12: PROCEEDINGS - TECHNOLOGY MANAGEMENT FOR EMERGING TECHNOLOGIES, 2012, : 3613 - 3630
  • [29] Performance evaluation of energy systems with data reconciliation
    Lozano, M.A.
    Remiro, J.A.
    Informacion Tecnologica, 2001, 12 (02): : 99 - 104
  • [30] PERFORMANCE EVALUATION OF DATA COMMUNICATION-SYSTEMS
    REISER, M
    PROCEEDINGS OF THE IEEE, 1982, 70 (02) : 171 - 196