Enabling Scientific Data Storage and Processing on Big-data Systems

被引:0
|
作者
Biookaghazadeh, Saman [1 ]
Xu, Yiqi [2 ]
Zhou, Shujia [3 ]
Zhao, Ming [1 ]
机构
[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, Tempe, AZ 85287 USA
[2] Florida Int Univ, Sch Comp & Informat Sci, Miami, FL USA
[3] Northrop Grumman Informat Technol, Colorado Springs, CO USA
关键词
Scientific data; big data; NetCDF; Hadoop;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big-data systems are increasingly important for solving the data-driven problems in many science domains including geosciences. However, existing big-data systems cannot support the self-describing data formats such as NetCDF which are commonly used by scientific communities for data distribution and sharing. This limitation presents a serious hurdle to the further adoption of big-data systems by science domains and prevents scientific users from leveraging these systems to improve their productivity. This paper presents a solution to this problem by enabling big-data systems to directly store and process scientific data. Specifically, it enables Hadoop to efficiently store NetCDF data on HDFS and process them in MapReduce using convenient APIs. It also enables Hive to support standard queries on NetCDF data, transparently to users. The paper also presents an evaluation of the proposed solution using several representative queries on a typical geoscientific dataset. The results show that the proposed approach achieves substantial speedup (up to 20 times) and space saving (83% reduction), compared to the traditional approach which has to convert NetCDF data to CSV format for Hadoop and Hive to use them.
引用
收藏
页码:1978 / 1984
页数:7
相关论文
共 50 条
  • [1] Kaleido: Enabling Efficient Scientific Data Processing on Big-Data Systems
    Biookaghazadeh, Saman
    Zhou, Shujia
    Zhao, Ming
    [J]. 2017 INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE (NAS), 2017, : 121 - 130
  • [2] BigCache for Big-data Systems
    Roger, Michel Angelo
    Xu, Yiqi
    Zhao, Ming
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 189 - 194
  • [3] Data Modifications in Blockchain Architecture for Big-Data Processing
    Tulkinbekov, Khikmatullo
    Kim, Deok-Hwan
    [J]. SENSORS, 2023, 23 (21)
  • [4] A big-data processing framework for uncertainties in transportation data
    Yang, Jie
    Ma, Jun
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [5] Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads
    Mehta, Parmita
    Dorkenwald, Sven
    Zhao, Dongfang
    Kaftan, Tomer
    Cheung, Alvin
    Balazinska, Magdalena
    Rokem, Ariel
    Connolly, Andrew
    Vanderplas, Jacob
    AlSayyad, Yusra
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1226 - 1237
  • [6] Analysis and Optimization of Big-Data Stream Processing
    Vakilinia, Shahin
    Zhang, Xinyao
    Qiu, Dongyu
    [J]. 2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,
  • [7] Understanding Unsuccessful Executions in Big-Data Systems
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    [J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 741 - 744
  • [8] Failure Analysis and Prediction for Big-Data Systems
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2017, 10 (06) : 984 - 998
  • [9] SPBD:Streamlining Big-Data Processing in Cloud Environments
    Tung Nguyen
    Jingwen Zhang
    Weisong Shi
    [J]. ZTE Communications, 2013, 11 (02) : 30 - 37
  • [10] SpaceViz: Visualization Tool for the Computer Storage Big-data
    Thomas, J. Joshua
    Khader, Ahamad Tajudin
    Belaton, Bahari
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 882 - 885