A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce

被引:40
|
作者
Li, Zhenlong [1 ,2 ]
Hu, Fei [1 ]
Schnase, John L. [3 ]
Duffy, Daniel Q. [4 ]
Lee, Tsengdar [5 ]
Bowen, Michael K. [4 ]
Yang, Chaowei [1 ]
机构
[1] George Mason Univ, NSF Spatiotemporal Innovat Ctr, Fairfax, VA 22030 USA
[2] Univ South Carolina, Dept Geog, Columbia, SC 29208 USA
[3] NASA Goddard Space Flight Ctr, Off Computat & Informat Sci & Technol, Greenbelt, MD USA
[4] NASA, Goddard Space Flight Ctr, Ctr Climate Simulat, Greenbelt, MD USA
[5] NASA, Earth Sci Div, Washington, DC 20546 USA
基金
美国国家科学基金会;
关键词
Spatiotemporal index; big climate data; array-based; Hadoop MapReduce; HDFS; NASA MERRA; climate change;
D O I
10.1080/13658816.2015.1131830
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Climate observations and model simulations are producing vast amounts of array-based spatiotemporal data. Efficient processing of these data is essential for assessing global challenges such as climate change, natural disasters, and diseases. This is challenging not only because of the large data volume, but also because of the intrinsic high-dimensional nature of geoscience data. To tackle this challenge, we propose a spatiotemporal indexing approach to efficiently manage and process big climate data with MapReduce in a highly scalable environment. Using this approach, big climate data are directly stored in a Hadoop Distributed File System in its original, native file format. A spatiotemporal index is built to bridge the logical array-based data model and the physical data layout, which enables fast data retrieval when performing spatiotemporal queries. Based on the index, a data-partitioning algorithm is applied to enable MapReduce to achieve high data locality, as well as balancing the workload. The proposed indexing approach is evaluated using the National Aeronautics and Space Administration (NASA) Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. The experimental results show that the index can significantly accelerate querying and processing (similar to 10x speedup compared to the baseline test using the same computing cluster), while keeping the index-to-data ratio small (0.0328%). The applicability of the indexing approach is demonstrated by a climate anomaly detection deployed on a NASA Hadoop cluster. This approach is also able to support efficient processing of general array-based spatiotemporal data in various geoscience domains without special configuration on a Hadoop cluster.
引用
收藏
页码:17 / 35
页数:19
相关论文
共 50 条
  • [1] A spatiotemporal compression based approach for efficient big data processing on Cloud
    Yang, Chi
    Zhang, Xuyun
    Zhong, Changmin
    Liu, Chang
    Pei, Jian
    Ramamohanarao, Kotagiri
    Chen, Jinjun
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2014, 80 (08) : 1563 - 1583
  • [2] Efficient Big Data Processing in Hadoop MapReduce
    Dittrich, Jens
    Quiane-Ruiz, Jorge-Arnulfo
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2014 - 2015
  • [3] A Lightweight Indexing Approach for Efficient Batch Similarity Processing with MapReduce
    Phan T.N.
    Dang T.K.
    [J]. SN Computer Science, 2020, 1 (1)
  • [4] Big Data retrieval techniques based on Hash Indexing and MapReduce approach with NoSQL Database
    Gayathiri, N. R.
    Jaspher, David D.
    Natarajan, A. M.
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [5] In-Mapper combiner based MapReduce algorithm for processing of big climate data
    Manogaran, Gunasekaran
    Lopez, Daphne
    Chilamkurti, Naveen
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 433 - 445
  • [6] MapReduce-based storage and indexing for big health data
    Gayathiri, N. R.
    Natarajan, A. M.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
  • [7] A MapReduce-based scalable discovery and indexing of structured big data
    Singh, Hari
    Bawa, Seema
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 73 : 32 - 43
  • [8] Efficient finer-grained incremental processing with MapReduce for big data
    Zhang, Liang
    Feng, Yuanyuan
    Shen, Peiyi
    Zhu, Guangming
    Wei, Wei
    Song, Juan
    Shah, Syed Afaq Ali
    Bennamoun, Mohammed
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 80 : 102 - 111
  • [9] Prominence of MapReduce in BIG DATA Processing
    Pandey, Shweta
    Tokekar, Vrinda
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 555 - 560
  • [10] Verifying Properties of MapReduce-Based Big Data Processing
    Zhang, Nan
    Wang, Meng
    Duan, Zhenhua
    Tian, Cong
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338