Adding data analytics capabilities to scaled-out object store

被引:0
|
作者
Karakoyunlu, Cengiz [1 ]
Chandy, John A. [1 ]
Riska, Alma [2 ]
机构
[1] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
[2] NetApp Inc, Waltham, MA USA
关键词
In-situ data analytics; Object storage; Attribute-based storage; MapReduce;
D O I
10.1016/j.jss.2016.07.029
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This work focuses on enabling effective data analytics on scaled-out object storage systems. Typically, applications perform MapReduce computations by first copying large amounts of data to a separate compute cluster (i.e. a Hadoop cluster). However; this approach is not very efficient considering that storage systems can host hundreds of petabytes of data. Network bandwidth can be easily saturated and the overall energy consumption would increase during large-scale data transfer. Instead of moving data between remote clusters; we propose the implementation of a data analytics layer on an object-based storage cluster to perform in-place MapReduce computation on existing data. The analytics layer is tied to the underlying object store, utilizing its data redundancy and distribution policies across the cluster. We implemented this approach with Ceph object storage system and Hadoop, and conducted evaluations with various benchmarks. Performance evaluations show that initial data copy performance is improved by up to 96% and the MapReduce performance is improved by up to 20% compared to the stock Hadoop implementation. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:16 / 27
页数:12
相关论文
共 50 条
  • [31] Decoding Data Analytics Capabilities from Topic Modeling on Press Releases
    Bonilla, JeanCarlo
    Rao, Bharat
    PICMET '15 PORTLAND INTERNATIONAL CENTER FOR MANAGEMENT OF ENGINEERING AND TECHNOLOGY, 2015, : 1959 - 1968
  • [32] Big data analytics capabilities: a systematic literature review and research agenda
    Mikalef, Patrick
    Pappas, Ilias O.
    Krogstie, John
    Giannakos, Michail
    INFORMATION SYSTEMS AND E-BUSINESS MANAGEMENT, 2018, 16 (03) : 547 - 578
  • [33] Positioning big data analytics capabilities towards financial service agility
    Edu, Abeeku Sam
    ASLIB JOURNAL OF INFORMATION MANAGEMENT, 2022, 74 (04) : 569 - 588
  • [34] Big data analytics capabilities: a systematic literature review and research agenda
    Patrick Mikalef
    Ilias O. Pappas
    John Krogstie
    Michail Giannakos
    Information Systems and e-Business Management, 2018, 16 : 547 - 578
  • [35] Big data analytics capabilities and leadership: catalysts of firm performance in telecommunications
    Shafqat, Hira
    Zhang, Baojian
    Ahmed, Muhammad
    Ullah, Muhammad Rizwan
    Zulfiqar, Muhammad
    BUSINESS PROCESS MANAGEMENT JOURNAL, 2024,
  • [36] Big Data Analytics as an Enabler of Process Innovation Capabilities: A Configurational Approach
    Mikalef, Patrick
    Krogstie, John
    BUSINESS PROCESS MANAGEMENT (BPM 2018), 2018, 11080 : 426 - 441
  • [37] Big data analytics capabilities and firm performance: An integrated MCDM approach
    Yasmin, Mariam
    Tatoglu, Ekrem
    Kilic, Huseyin Selcuk
    Zaim, Selim
    Delen, Dursun
    JOURNAL OF BUSINESS RESEARCH, 2020, 114 : 1 - 15
  • [38] A I 1iterature Review on Big Data Analytics Capabilities
    Shdifat, B.
    Cetindamar, D.
    Erfani, S.
    2019 PORTLAND INTERNATIONAL CONFERENCE ON MANAGEMENT OF ENGINEERING AND TECHNOLOGY (PICMET), 2019,
  • [39] Big data analytics capabilities and knowledge management: impact on firm performance
    Ferraris, Alberto
    Mazzoleni, Alberto
    Devalle, Alain
    Couturier, Jerome
    MANAGEMENT DECISION, 2019, 57 (08) : 1923 - 1936
  • [40] Big data analytics capabilities and innovation effect of dynamic capabilities, organizational culture and role of management accountants
    Munir, Sabra
    Rasid, Siti Zaleha Abdul
    Aamir, Muhammad
    Jamil, Farrukh
    Ahmed, Ishfaq
    FORESIGHT, 2023, 25 (01): : 41 - 66