Adding data analytics capabilities to scaled-out object store

被引:0
|
作者
Karakoyunlu, Cengiz [1 ]
Chandy, John A. [1 ]
Riska, Alma [2 ]
机构
[1] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
[2] NetApp Inc, Waltham, MA USA
关键词
In-situ data analytics; Object storage; Attribute-based storage; MapReduce;
D O I
10.1016/j.jss.2016.07.029
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This work focuses on enabling effective data analytics on scaled-out object storage systems. Typically, applications perform MapReduce computations by first copying large amounts of data to a separate compute cluster (i.e. a Hadoop cluster). However; this approach is not very efficient considering that storage systems can host hundreds of petabytes of data. Network bandwidth can be easily saturated and the overall energy consumption would increase during large-scale data transfer. Instead of moving data between remote clusters; we propose the implementation of a data analytics layer on an object-based storage cluster to perform in-place MapReduce computation on existing data. The analytics layer is tied to the underlying object store, utilizing its data redundancy and distribution policies across the cluster. We implemented this approach with Ceph object storage system and Hadoop, and conducted evaluations with various benchmarks. Performance evaluations show that initial data copy performance is improved by up to 96% and the MapReduce performance is improved by up to 20% compared to the stock Hadoop implementation. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:16 / 27
页数:12
相关论文
共 50 条
  • [1] Design and characterization of bubble-splitting distributor for scaled-out multiphase microreactors
    Hoang, Duong A.
    Haringa, Cees
    Portela, Luis M.
    Kreutzer, Michiel T.
    Kleijn, Chris R.
    van Steijn, Volkert
    CHEMICAL ENGINEERING JOURNAL, 2014, 236 : 545 - 554
  • [2] Scaled-out multilayer gas-liquid microreactor with integrated velocimetry sensors
    de Mas, N
    Günther, A
    Kraus, T
    Schmidt, MA
    Jensen, KF
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2005, 44 (24) : 8997 - 9013
  • [3] Towards a Conceptualization of Data Analytics Capabilities
    Shuradze, Giorgi
    Wagner, Heinz-Theo
    PROCEEDINGS OF THE 49TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS 2016), 2016, : 5052 - 5064
  • [4] Capabilities and Readiness for Big Data Analytics
    Pedro, Jenifer
    Brown, Irwin
    Hart, Mike
    CENTERIS2019--INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS/PROJMAN2019--INTERNATIONAL CONFERENCE ON PROJECT MANAGEMENT/HCIST2019--INTERNATIONAL CONFERENCE ON HEALTH AND SOCIAL CARE INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, 164 : 3 - 10
  • [5] VStore: A Data Store for Analytics on Large Videos
    Xu, Tiantu
    Botelho, Luis Materon
    Lin, Felix Xiaozhu
    PROCEEDINGS OF THE FOURTEENTH EUROSYS CONFERENCE 2019 (EUROSYS '19), 2019,
  • [6] Adding Object Manipulation Capabilities to Social Robots by using 3D and RGB Cameras Data
    Mezzina, Giovanni
    De Venuto, Daniela
    2021 IEEE SENSORS, 2021,
  • [7] A Scalable Object Store for Meteorological and Climate Data
    Smart, Simon D.
    Quintino, Tiago
    Raoult, Baudouin
    PROCEEDINGS OF THE PLATFORM FOR ADVANCED SCIENTIFIC COMPUTING CONFERENCE (PASC17), 2017,
  • [8] Out-of-store Object Detection Based on Deep Learning
    Chen, Jinyin
    Wang, Zhen
    Cheng, Kai-hui
    Zheng, Hai-bin
    Pan, An-tao
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 423 - 428
  • [9] Adding string processing capabilities to data management systems
    Hakli, R
    Nykänen, M
    Tamm, H
    SPIRE 2000: SEVENTH INTERNATIONAL SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL - PROCEEDINGS, 2000, : 122 - 131
  • [10] Big data analytics, dynamic capabilities and firm performance
    Singh, Sanjay Kumar
    Del Giudice, Manlio
    MANAGEMENT DECISION, 2019, 57 (08) : 1729 - 1733