Shock: Active Storage for Multicloud Streaming Data Analysis

被引:2
|
作者
Bischof, Jared [1 ,2 ]
Wilke, Andreas [1 ,2 ]
Gerlach, Wolfgang [1 ,2 ]
Harrison, Travis [1 ,2 ]
Paczian, Tobias [1 ,2 ]
Tang, Wei [3 ]
Trimble, William [1 ,2 ]
Wilkening, Jared [4 ]
Desai, Narayan [5 ]
Meyer, Folker [1 ,2 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
[2] Univ Chicago, Chicago, IL 60637 USA
[3] Google Inc, Mountain View, CA USA
[4] Dramafever Inc, New York, NY USA
[5] Ericsson, San Jose, CA USA
关键词
bioinformatics; metagenomics; active object store; distributed wide-area computing;
D O I
10.1109/BDC.2015.40
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Access to data plays a major role in designing and performing efficient data computation and analyses in a distributed environment. Existing approaches access data via a variety of methods and offer various benefits and drawbacks based on the use case. Our original use case was the computational analysis of environmental sequence data, or metagenomics. Unlike other workflows that often reduce the dataset size dramatically within the first few processing steps, owing to biologially-motivated data compression. Metagenomic data compresses poorly, and so metagenomic workflows add to the size of the data set along the processing pipeline. Thus, wide-area, high-throughput access to the data is essential. To address this problem, we developed Shock, a data store for files, their associated metadata, and indexes that allow Shock to provide different views into the data. Shock comprises three major components: a web service that provides a RESTful API, backend data storage for files, and storage for object metadata. Shock has proven to be a stable data store for MG-RAST, an application that served over 40,000 users in 2014 on a server that houses more than 3 million data objects. Moreover, Shock provides both subselection and high-performance file transfer capabilities that serve most usages.
引用
收藏
页码:68 / 72
页数:5
相关论文
共 50 条
  • [41] A Toolkit for Streaming Process Data Analysis
    Dijkman, Remco M.
    Peters, Sander P. F.
    ter Hofstede, Arthur H. M.
    2016 IEEE 20TH INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING WORKSHOP (EDOCW), 2016, : 304 - 312
  • [42] Online and Offline Analysis of Streaming Data
    Hoque, Sheik
    Miranskyy, Andriy
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION (ICSA-C 2018), 2018, : 68 - 71
  • [43] Streaming Data Analysis: Clustering or Classification?
    Bezdek, James C.
    Keller, James M.
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (01): : 91 - 102
  • [44] Streaming Massive Electric Power Data Analysis Based on Spark Streaming
    Zhang, Xudong
    Qian, Zhongwen
    Shen, Siqi
    Shi, Jia
    Wang, Shujun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 200 - 212
  • [45] A data lake-based security transmission and storage scheme for streaming big data
    Zhao, Xiaoyan
    Zhang, Conghui
    Guan, Shaopeng
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (04): : 4741 - 4755
  • [46] Concurrent and Storage-Aware Data Streaming for Data Processing Workflows in Grid Environments
    张文
    曹军威
    钟宜生
    刘连臣
    吴澄
    Tsinghua Science and Technology, 2010, 15 (03) : 335 - 346
  • [47] Security-Aware Data Allocation in Multicloud Scenarios
    di Vimercati, Sabrina De Capitani
    Foresti, Sara
    Livraga, Giovanni
    Piuri, Vincenzo
    Samarati, Pierangela
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2021, 18 (05) : 2456 - 2468
  • [48] Efficient Data Migration to Conserve Energy in Streaming Media Storage Systems
    Chai, Yunpeng
    Du, Zhihui
    Bader, David A.
    Qin, Xiao
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (11) : 2081 - 2093
  • [49] Caching and data allocation for streaming service in MEMS-based storage
    Kwon, Ohhoon
    Yoo, Yunjung
    Bahn, Hyokyung
    Koh, Kern
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCES AND ITS APPLICATIONS, PROCEEDINGS, 2008, : 132 - +
  • [50] StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data
    Maeda, Munenori
    Ozawa, Toshihiro
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2014, 50 (01): : 24 - 29