Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems

被引:0
|
作者
Biookaghazadeh, Saman [1 ]
Zhou, Shujia [2 ]
Zhao, Ming [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
[2] Northrup Grumman, Baltimore, MD USA
基金
美国国家科学基金会;
关键词
MAPREDUCE;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big-Data systems are increasingly important for solving the data-driven problems in many science domains. However, existing big-data systems cannot support the efficient processing of self-describing data formats such as NetCDF which are commonly used by scientific communities for data distribution and sharing. This limitation presents a serious hurdle to the further adoption of big-data systems by science domains. This paper presents Kaleido, a solution to this problem by enabling big-data systems to efficiently store and process scientific data. Specifically, it enables Hadoop to directly store NetCDF data on HDFS, and process them in MapReduce using convenient APIs. It also enables Hive to support queries on NetCDF data, transparent to the users. Moreover, it employs optimizations tailored to scientific data, particularly dimension-aware layouts which allow efficient execution of subset queries targeting any dimension of a multi-dimensional dataset. The paper presents a comprehensive evaluation of Kaleido using representative queries on a typical geoscience dataset. The results show that Kaleido achieves substantial speedup and space saving compared to existing solutions for storing and processing NetCDF data on Hadoop, and it also substantially outperforms the state-of-the-art solutions for supporting subset queries on scientific data.
引用
收藏
页码:121 / 130
页数:10
相关论文
共 50 条
  • [1] Enabling Scientific Data Storage and Processing on Big-data Systems
    Biookaghazadeh, Saman
    Xu, Yiqi
    Zhou, Shujia
    Zhao, Ming
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1978 - 1984
  • [2] BigCache for Big-data Systems
    Roger, Michel Angelo
    Xu, Yiqi
    Zhao, Ming
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 189 - 194
  • [3] Data Modifications in Blockchain Architecture for Big-Data Processing
    Tulkinbekov, Khikmatullo
    Kim, Deok-Hwan
    [J]. SENSORS, 2023, 23 (21)
  • [4] A big-data processing framework for uncertainties in transportation data
    Yang, Jie
    Ma, Jun
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [5] An Efficient Industrial Big-Data Engine
    Basanta-Val, Pablo
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (04) : 1361 - 1369
  • [6] Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads
    Mehta, Parmita
    Dorkenwald, Sven
    Zhao, Dongfang
    Kaftan, Tomer
    Cheung, Alvin
    Balazinska, Magdalena
    Rokem, Ariel
    Connolly, Andrew
    Vanderplas, Jacob
    AlSayyad, Yusra
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1226 - 1237
  • [7] Analysis and Optimization of Big-Data Stream Processing
    Vakilinia, Shahin
    Zhang, Xinyao
    Qiu, Dongyu
    [J]. 2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
  • [8] Failure Analysis and Prediction for Big-Data Systems
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2017, 10 (06) : 984 - 998
  • [9] Understanding Unsuccessful Executions in Big-Data Systems
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    [J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 741 - 744
  • [10] SPBD:Streamlining Big-Data Processing in Cloud Environments
    Tung Nguyen
    Jingwen Zhang
    Weisong Shi
    [J]. ZTE Communications, 2013, 11 (02) : 30 - 37