Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems

被引：0

作者：

Biookaghazadeh, Saman ^{[1
]}

Zhou, Shujia ^{[2
]}

Zhao, Ming ^{[1
]}

机构：

[1] Arizona State Univ, Tempe, AZ 85281 USA

[2] Northrup Grumman, Baltimore, MD USA

来源：

2017 INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS) | 2017年

基金：

美国国家科学基金会;

关键词：

MAPREDUCE;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Big-Data systems are increasingly important for solving the data-driven problems in many science domains. However, existing big-data systems cannot support the efficient processing of self-describing data formats such as NetCDF which are commonly used by scientific communities for data distribution and sharing. This limitation presents a serious hurdle to the further adoption of big-data systems by science domains. This paper presents Kaleido, a solution to this problem by enabling big-data systems to efficiently store and process scientific data. Specifically, it enables Hadoop to directly store NetCDF data on HDFS, and process them in MapReduce using convenient APIs. It also enables Hive to support queries on NetCDF data, transparent to the users. Moreover, it employs optimizations tailored to scientific data, particularly dimension-aware layouts which allow efficient execution of subset queries targeting any dimension of a multi-dimensional dataset. The paper presents a comprehensive evaluation of Kaleido using representative queries on a typical geoscience dataset. The results show that Kaleido achieves substantial speedup and space saving compared to existing solutions for storing and processing NetCDF data on Hadoop, and it also substantially outperforms the state-of-the-art solutions for supporting subset queries on scientific data.

引用

页码：121 / 130

页数：10

共 50 条

[1] Enabling Scientific Data Storage and Processing on Big-data Systems
Biookaghazadeh, Saman
Xu, Yiqi
Zhou, Shujia
Zhao, Ming
[J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1978 - 1984
[2] BigCache for Big-data Systems
Roger, Michel Angelo
Xu, Yiqi
Zhao, Ming
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 189 - 194
[3] Data Modifications in Blockchain Architecture for Big-Data Processing
Tulkinbekov, Khikmatullo
Kim, Deok-Hwan
[J]. SENSORS, 2023, 23 (21)
[4] A big-data processing framework for uncertainties in transportation data
Yang, Jie
Ma, Jun
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
[5] An Efficient Industrial Big-Data Engine
Basanta-Val, Pablo
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2018, 14 (04) : 1361 - 1369
[6] Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads
Mehta, Parmita
Dorkenwald, Sven
Zhao, Dongfang
Kaftan, Tomer
Cheung, Alvin
Balazinska, Magdalena
Rokem, Ariel
Connolly, Andrew
Vanderplas, Jacob
AlSayyad, Yusra
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1226 - 1237
[7] Analysis and Optimization of Big-Data Stream Processing
Vakilinia, Shahin
Zhang, Xinyao
Qiu, Dongyu
[J]. 2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
[8] Failure Analysis and Prediction for Big-Data Systems
Rosa, Andrea
Chen, Lydia Y.
Binder, Walter
[J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2017, 10 (06) : 984 - 998
[9] Understanding Unsuccessful Executions in Big-Data Systems
Rosa, Andrea
Chen, Lydia Y.
Binder, Walter
[J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 741 - 744
[10] SPBD:Streamlining Big-Data Processing in Cloud Environments
Tung Nguyen
Jingwen Zhang
Weisong Shi
[J]. ZTE Communications, 2013, 11 (02) : 30 - 37

← 1 2 3 4 5 →