Enabling Scientific Data Storage and Processing on Big-data Systems

被引:0
|
作者
Biookaghazadeh, Saman [1 ]
Xu, Yiqi [2 ]
Zhou, Shujia [3 ]
Zhao, Ming [1 ]
机构
[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, Tempe, AZ 85287 USA
[2] Florida Int Univ, Sch Comp & Informat Sci, Miami, FL USA
[3] Northrop Grumman Informat Technol, Colorado Springs, CO USA
关键词
Scientific data; big data; NetCDF; Hadoop;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big-data systems are increasingly important for solving the data-driven problems in many science domains including geosciences. However, existing big-data systems cannot support the self-describing data formats such as NetCDF which are commonly used by scientific communities for data distribution and sharing. This limitation presents a serious hurdle to the further adoption of big-data systems by science domains and prevents scientific users from leveraging these systems to improve their productivity. This paper presents a solution to this problem by enabling big-data systems to directly store and process scientific data. Specifically, it enables Hadoop to efficiently store NetCDF data on HDFS and process them in MapReduce using convenient APIs. It also enables Hive to support standard queries on NetCDF data, transparently to users. The paper also presents an evaluation of the proposed solution using several representative queries on a typical geoscientific dataset. The results show that the proposed approach achieves substantial speedup (up to 20 times) and space saving (83% reduction), compared to the traditional approach which has to convert NetCDF data to CSV format for Hadoop and Hive to use them.
引用
收藏
页码:1978 / 1984
页数:7
相关论文
共 50 条
  • [21] Processing big-data with Memristive Technologies: Splitting the Hyperplane Efficiently
    Serb, A.
    Papandroulidakis, G.
    Khiat, A.
    Prodromakis, T.
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [22] The Linear Estimation Problem and Information in Big-Data Systems
    Golubtsov, P., V
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2018, 52 (02) : 73 - 79
  • [23] ARE YOU READY FOR BIG DATA? GOVERNANCE IN BIG-DATA RESEARCH
    Scheepers, Floortje E.
    Deschamps, Peter
    JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2016, 55 (10): : S309 - S309
  • [24] A Data Reconstruction Method for The Big-Data Analysis
    Mito, Masataka
    Murata, Kenya
    Eguchi, Daisuke
    Mori, Yuichiro
    Toyonaga, Masahiko
    2018 9TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2018, : 319 - 323
  • [25] A code offloading scheme for big-data processing in android applications
    Hung, Shih-Hao
    Tzeng, Tien-Tzong
    Wu, Gyun-De
    Shieh, Jeng-Peng
    SOFTWARE-PRACTICE & EXPERIENCE, 2015, 45 (08): : 1087 - 1101
  • [26] Neurotrauma as a big-data problem
    Huie, J. Russell
    Almeida, Carlos A.
    Ferguson, Adam R.
    CURRENT OPINION IN NEUROLOGY, 2018, 31 (06) : 702 - 708
  • [27] 'Big-Data' in dermatological research
    Kaliyadan, Feroze
    Chatterjee, Kingshuk
    INDIAN JOURNAL OF DERMATOLOGY VENEREOLOGY & LEPROLOGY, 2024, 90 (03): : 342 - 344
  • [28] Lessons for big-data projects
    Birney, Ewan
    NATURE, 2012, 489 (7414) : 49 - 51
  • [29] Data Transfer Scheduling for Maximizing Throughput of Big-Data Computing in Cloud Systems
    Xie, Ruitao
    Jia, Xiaohua
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (01) : 87 - 98
  • [30] Scalable Distributed Storage for Big Scientific Data
    Kokoulin, Andrey N.
    Yuzhakov, Aleksandr A.
    Kiryanov, Dmitriy A.
    PROCEEDINGS OF THE 2018 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (EICONRUS), 2018, : 1099 - 1103