Adding data analytics capabilities to scaled-out object store

被引:0
|
作者
Karakoyunlu, Cengiz [1 ]
Chandy, John A. [1 ]
Riska, Alma [2 ]
机构
[1] Univ Connecticut, Dept Elect & Comp Engn, Storrs, CT 06269 USA
[2] NetApp Inc, Waltham, MA USA
关键词
In-situ data analytics; Object storage; Attribute-based storage; MapReduce;
D O I
10.1016/j.jss.2016.07.029
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This work focuses on enabling effective data analytics on scaled-out object storage systems. Typically, applications perform MapReduce computations by first copying large amounts of data to a separate compute cluster (i.e. a Hadoop cluster). However; this approach is not very efficient considering that storage systems can host hundreds of petabytes of data. Network bandwidth can be easily saturated and the overall energy consumption would increase during large-scale data transfer. Instead of moving data between remote clusters; we propose the implementation of a data analytics layer on an object-based storage cluster to perform in-place MapReduce computation on existing data. The analytics layer is tied to the underlying object store, utilizing its data redundancy and distribution policies across the cluster. We implemented this approach with Ceph object storage system and Hadoop, and conducted evaluations with various benchmarks. Performance evaluations show that initial data copy performance is improved by up to 96% and the MapReduce performance is improved by up to 20% compared to the stock Hadoop implementation. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:16 / 27
页数:12
相关论文
共 50 条
  • [21] Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics
    Ramakrishnan, Raghu
    Sridharan, Baskar
    Douceur, John R.
    Kasturi, Pavan
    Krishnamachari-Sampath, Balaji
    Krishnamoorthy, Karthick
    Li, Peng
    Manu, Mitica
    Michaylov, Spiro
    Ramos, Rogerio
    Sharman, Neil
    Xu, Zee
    Barakat, Youssef
    Douglas, Chris
    Draves, Richard
    Naidu, Shrikant S.
    Shastry, Shankar
    Sikaria, Atul
    Sun, Simon
    Venkatesan, Ramarathnam
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 51 - 63
  • [22] GRAPHONE: A Data Store for Real-time Analytics on Evolving Graphs
    Kumar, Pradeep
    Huang, H. Howie
    ACM TRANSACTIONS ON STORAGE, 2020, 15 (04)
  • [23] GRAPHONE: A Data Store for Real-time Analytics on Evolving Graphs
    Kumar, Pradeep
    Huang, H. Howie
    PROCEEDINGS OF THE 17TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2019, : 249 - 263
  • [24] Decision making performance of business analytics capabilities: the role of big data literacy and analytics competency
    Fattah, Ikhsan A.
    BUSINESS PROCESS MANAGEMENT JOURNAL, 2024,
  • [25] Big Data Analytics Capabilities and Innovation: The Mediating Role of Dynamic Capabilities and Moderating Effect of the Environment
    Mikalef, Patrick
    Boura, Maria
    Lekakos, George
    Krogstie, John
    BRITISH JOURNAL OF MANAGEMENT, 2019, 30 (02) : 272 - 298
  • [26] SwiftAnalytics: Optimizing Object Storage for Big Data Analytics
    Rupprecht, Lukas
    Zhang, Rui
    Owen, Bill
    Pietzuch, Peter
    Hildebrand, Dean
    2017 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2017), 2017, : 245 - 251
  • [27] Value creation through marketing data analytics: The distinct contribution of data analytics assets and capabilities to unit and firm performance
    Saenz, Josune
    de Guinea, Ana Ortiz
    Penalba-Aguirrezabalaga, Carmela
    INFORMATION & MANAGEMENT, 2022, 59 (08)
  • [28] How the Interaction of Big Data Analytics Capabilities and Digital Platform Capabilities Affects Service Innovation: A Dynamic Capabilities View
    Xiao, Xiaohong
    Tian, Qinghong
    Mao, Hongyi
    IEEE ACCESS, 2020, 8 : 18778 - 18796
  • [29] Three-phase microfluidic reactor networks - Design, modeling and application to scaled-out nanoparticle-catalyzed hydrogenations with online catalyst recovery and recycle
    Yap, Swee Kun
    Wong, Wai Kuan
    Ng, Nicholas Xiang Yang
    Khan, Saif A.
    CHEMICAL ENGINEERING SCIENCE, 2017, 169 : 117 - 127
  • [30] A Ceph S3 Object Data Store for HEP
    Smith, Nick
    Jayatilaka, Bo
    Mason, David
    Gutsche, Oliver
    Peisker, Alison
    Illingworth, Robert
    Jones, Chris
    26TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS, CHEP 2023, 2024, 295