SciSpark: Highly Interactive In-Memory Science Data Analytics

被引:0
|
作者
Wilson, Brian [1 ]
Palamuttam, Rahul [1 ,3 ]
Whitehall, Kim [1 ]
Mattmann, Chris [1 ,2 ]
Goodman, Alex [1 ]
Boustani, Maziyar [1 ]
Shah, Sujen [1 ]
Zimdars, Paul [1 ]
Ramirez, Paul [1 ]
机构
[1] CALTECH, Jet Prop Lab, NASA, Pasadena, CA 91109 USA
[2] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[3] Stanford Univ, Palo Alto, CA 94304 USA
基金
美国国家航空航天局;
关键词
Apache Spark; in-memory distributed computing; large scientific datasets; SciSpark;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present further work on SciSpark, a Big Data framework that extends Apache Spark's in-memory parallel computing to scale scientific computations. SciSpark's current architecture and design includes: time and space partitioning of high-resolution geo-grids from NetCDF3/4; a sciDataset class providing N-dimensional array operations in Scala/Java and CF-style variable attributes (an update of our prior sciTensor class); parallel computation of time-series statistical metrics; and an interactive front-end using science (code & visualization) Notebooks. We demonstrate how SciSpark achieves parallel ingest and time/space partitioning of Earth science satellite and model datasets. We illustrate the usability, extensibility, and early performance of SciSpark using several Earth science Use cases, here presenting benchmarks for sciDataset Readers and parallel time-series analytics. A three-hour SciSpark tutorial was taught at an ESIP Federation meeting using a dozen "live" Notebooks.
引用
收藏
页码:2964 / 2973
页数:10
相关论文
共 50 条
  • [1] Eager Memory Management for In-Memory Data Analytics
    Jang, Hakbeom
    Bae, Jonghyun
    Ham, Tae Jun
    Lee, Jae W.
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (03): : 632 - 636
  • [2] In-Memory Computing for Scalable Data Analytics
    Li, Jun
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 93 - 94
  • [3] An In-Memory based Framework for Scientific Data Analytics
    Elia, Donatello
    Fiore, Sandro
    D'Anca, Alessandro
    Palazzo, Cosimo
    Foster, Ian
    Williams, Dean N.
    [J]. PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 424 - 429
  • [4] Distributed In-Memory Analytics for Big Temporal Data
    Yao, Bin
    Zhang, Wei
    Wang, Zhi-Jie
    Chen, Zhongpu
    Shang, Shuo
    Zheng, Kai
    Guo, Minyi
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 549 - 565
  • [5] CHOPPER: Optimizing Data Partitioning for In-Memory Data Analytics Frameworks
    Paul, Arnab Kumar
    Zhuang, Wenjie
    Xu, Luna
    Li, Min
    Rafique, M. Mustafa
    Butt, Ali R.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 110 - 119
  • [6] Using In-Memory Analytics to Quickly Crunch Big Data
    Garber, Lee
    [J]. COMPUTER, 2012, 45 (10) : 16 - 18
  • [7] SciSpark: Applying In-memory Distributed Computing to Weather Event Detection and Tracking
    Palamuttam, Rahul
    Mogrovejo, Renato Marroquin
    Mattmann, Chris
    Wilson, Brian
    Whitehall, Kim
    Verma, Rishi
    McGibbney, Lewis
    Ramirez, Paul
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2020 - 2026
  • [8] Towards Automatic Memory Tuning for In-Memory Big Data Analytics in Clusters
    Koliopoulos, Aris-Kyriakos
    Yiapanis, Paraskevas
    Tekiner, Firat
    Nenadic, Goran
    Keane, John
    [J]. 2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 353 - 356
  • [9] Practical Near-Data Processing for In-memory Analytics Frameworks
    Gao, Mingyu
    Ayers, Grant
    Kozyrakis, Christos
    [J]. 2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 113 - 124
  • [10] A Multi-GPU Framework for In-Memory Text Data Analytics
    Chong, Poh Kit
    Karuppiah, Ettikan K.
    Yong, Keh Kok
    [J]. 2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA), 2013, : 1411 - 1416