Distributed In Situ Processing of Big Raster Data in the Cloud

被引:3
|
作者
Zalipynis, Ramon Antonio Rodriges [1 ]
机构
[1] Natl Res Univ, Higher Sch Econ, Moscow, Russia
基金
俄罗斯基础研究基金会;
关键词
Big raster data; Climate reanalysis; Distributed systems; Cloud computing; SciDB; Array DBMS; In situ; NetCDF operators;
D O I
10.1007/978-3-319-74313-4_24
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A raster is the primary data type in Earth science, geology, remote sensing and other fields with tremendous growth of data volumes. An array DBMS is an option to tackle big raster data processing. However, raster data are traditionally stored in files, not in databases. Command line tools have long being developed to process raster files. Most tools are feature-rich and free but optimized for a single machine. This paper proposes new techniques for distributed processing of raster data directly in diverse file formats by delegating considerable portions of work to such tools. An N-dimensional array data model is proposed to maintain independence from the files and the tools. Also, a new scheme named GROUP-APPLY-FINALLY is presented to universally express the majority of raster data processing operations and streamline their distributed execution. New approaches make it possible to provide a rich collection of raster operations at scale and outperform SciDB over 410x on average on climate reanalysis data. SciDB is the only freely available distributed array DBMS to date. Experiments were carried out on 8- and 16-node clusters in Microsoft Azure Cloud.
引用
收藏
页码:337 / 351
页数:15
相关论文
共 50 条
  • [1] Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing
    Xiao, Fei
    Xie, Jiong
    Chen, Zhida
    Li, Feifei
    Chen, Zhen
    Liu, Jianwei
    Liu, Yinpei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3966 - 3969
  • [2] Distributed Zonal Statistics of Big Raster and Vector Data
    Singla, Samriddhi
    Eldawy, Ahmed
    [J]. 26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018), 2018, : 536 - 539
  • [3] Efficient and Customizable Data Partitioning Framework for Distributed Big RDF Data Processing in the Cloud
    Lee, Kisung
    Liu, Ling
    Tang, Yuzhe
    Zhang, Qi
    Zhou, Yang
    [J]. 2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 327 - 334
  • [4] RASTER DATA PARTITIONING FOR SUPPORTING DISTRIBUTED GIS PROCESSING
    Binh Nguyen Thai
    Olasz, Angela
    [J]. ISPRS GEOSPATIAL WEEK 2015, 2015, 40-3 (W3): : 543 - 551
  • [5] BIG DATA PROCESSING TUNING IN THE CLOUD
    Sabharwal, Satwik
    Malhotra, Nishchay
    Singh, Ajay Shanker
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 699 - 703
  • [6] Cloud Computing for Big Data Processing
    Li, Xiaofang
    Zhuang, Yanbin
    Yang, Simon X.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2017, 23 (04): : 545 - 546
  • [7] Array DBMS and Satellite Imagery: Towards Big Raster Data in the Cloud
    Zalipynis, Ramon Antonio Rodriges
    Pozdeev, Evgeniy
    Bryukhov, Anton
    [J]. ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2017, 2018, 10716 : 267 - 279
  • [8] Big Data Processing in Cloud Environments
    Tsuchiya, Satoshi
    Sakamoto, Yoshinori
    Tsuchimoto, Yuichi
    Lee, Vivian
    [J]. FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2012, 48 (02): : 159 - 168
  • [9] Developing the Raster Big Data Benchmark: A Comparison of Raster Analysis on Big Data Platforms
    Haynes, David
    Mitchell, Philip
    Shook, Eric
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (11)
  • [10] Raptor: Large Scale Processing of Big Raster plus Vector Data
    Singla, Samriddhi
    [J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2905 - 2907