Shock: Active Storage for Multicloud Streaming Data Analysis

被引:2
|
作者
Bischof, Jared [1 ,2 ]
Wilke, Andreas [1 ,2 ]
Gerlach, Wolfgang [1 ,2 ]
Harrison, Travis [1 ,2 ]
Paczian, Tobias [1 ,2 ]
Tang, Wei [3 ]
Trimble, William [1 ,2 ]
Wilkening, Jared [4 ]
Desai, Narayan [5 ]
Meyer, Folker [1 ,2 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
[2] Univ Chicago, Chicago, IL 60637 USA
[3] Google Inc, Mountain View, CA USA
[4] Dramafever Inc, New York, NY USA
[5] Ericsson, San Jose, CA USA
关键词
bioinformatics; metagenomics; active object store; distributed wide-area computing;
D O I
10.1109/BDC.2015.40
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Access to data plays a major role in designing and performing efficient data computation and analyses in a distributed environment. Existing approaches access data via a variety of methods and offer various benefits and drawbacks based on the use case. Our original use case was the computational analysis of environmental sequence data, or metagenomics. Unlike other workflows that often reduce the dataset size dramatically within the first few processing steps, owing to biologially-motivated data compression. Metagenomic data compresses poorly, and so metagenomic workflows add to the size of the data set along the processing pipeline. Thus, wide-area, high-throughput access to the data is essential. To address this problem, we developed Shock, a data store for files, their associated metadata, and indexes that allow Shock to provide different views into the data. Shock comprises three major components: a web service that provides a RESTful API, backend data storage for files, and storage for object metadata. Shock has proven to be a stable data store for MG-RAST, an application that served over 40,000 users in 2014 on a server that houses more than 3 million data objects. Moreover, Shock provides both subselection and high-performance file transfer capabilities that serve most usages.
引用
收藏
页码:68 / 72
页数:5
相关论文
共 50 条
  • [21] Data consistency protocol for multicloud systems
    Kozina O.A.
    Panchenko V.I.
    Kolomiitsev O.V.
    Usik V.V.
    Stratiienko N.K.
    Safoshkina L.V.
    Kucherenko Y.F.
    International Journal of Cloud Computing, 2024, 13 (01) : 42 - 61
  • [22] Meshing streaming updates with persistent data in an active data warehouse
    Polyzotis, Neoklis
    Skiadopoulos, Spiros
    Vassiliadis, Panos
    Simitsis, Alkis
    Frantzell, Nils-Erik
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (07) : 976 - 991
  • [23] Identity-based Provable Data Possession for Multicloud Storage with Parallel Key-Insulation
    Nithya, S. Mary, V
    Uthariaraj, V. Rhymend
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (09): : 3322 - 3347
  • [24] Robust technique for data security in multicloud storage using dynamic slicing with hybrid cryptographic technique
    Pravin, A.
    Jacob, T. Prem
    Nagarajan, G.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019,
  • [25] Analysis of software streaming data
    Kang, Sungjoo
    Ku, Kyung I.
    Hur, Sung Jin
    Choi, Wan
    9TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY: TOWARD NETWORK INNOVATION BEYOND EVOLUTION, VOLS 1-3, 2007, : 1210 - +
  • [26] Streaming Verification in Data Analysis
    Daruki, Samira
    Thaler, Justin
    Venkatasubramanian, Suresh
    ALGORITHMS AND COMPUTATION, ISAAC 2015, 2015, 9472 : 715 - 726
  • [27] Architecture for Analysis of Streaming Data
    Hoque, Sheik
    Miranskyy, Andriy
    2018 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2018), 2018, : 263 - 269
  • [28] Streaming Methods in Data Analysis
    Cormode, Graham
    DATA SCIENCE, 2015, 9147 : 3 - 6
  • [29] Streaming Data Analysis on the Wire
    Katramatos, Dimitrios
    Yue, Meng
    Yoo, Shinjae
    van Dam, Kerstin Kleese
    Xu, Jin
    Zhang, Jiayao
    2016 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS), 2016,
  • [30] StoreSim: Optimizing Information Leakage in Multicloud Storage Services
    Zhuang, Hao
    Rahman, Rameez
    Hui, Pan
    Aberer, Karl
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2015, : 379 - 386