Building and Operating a Large-Scale Enterprise Data Analytics Platform

被引:6
|
作者
Bauer, Daniel [1 ]
Froese, Florian [1 ]
Garces-Erice, Luis [1 ]
Giblin, Chris [1 ]
Labbi, Abdel [1 ]
Nagy, Zoltan A. [1 ]
Pardon, Niels [1 ]
Rooney, Sean [1 ]
Urbanetz, Peter [1 ]
Vetsch, Pascal [1 ]
Wespi, Andreas [1 ]
机构
[1] IBM Res Europe, Saumerstr 4, CH-8803 Ruschlikon, Switzerland
关键词
Hybrid cloud; Datalake; Storage; Ingestion; SQL/Hadoop; Governance;
D O I
10.1016/j.bdr.2020.100181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the last three years we have been running a large-scale data processing platform for applying analytics to corporate data at scale on an OpenStack private cloud instance. Our platform makes a wide variety of corporate data assets, such as sales, marketing, customer information, as well as data from less conventional sources such as weather, news and social media available for analytics purposes to hundreds of globally distributed teams across the company. We control every layer in the stack from the processing engines down to the hardware. Here we report our experiences in building and operating such a system. We describe our technical choices and describe how they evolved as we observed the actual workloads created by users. (C) 2020 The Authors. Published by Elsevier Inc.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] From model building to analytics solution in hours The enterprise platform for Analytics teams
    Brandys, Szymon
    Cakmak, Umit
    Cmielowski, Lukasz
    Solarski, Marcin
    [J]. PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC ADVANCE IN BEHAVIORAL, ECONOMIC, SOCIOCULTURAL COMPUTING (BESC), 2017,
  • [22] Scalable Data Analytics Platform for Enterprise Backup Management
    Song, Yang
    Routray, Ramani
    Hou, Yangyang
    [J]. 2014 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (NOMS), 2014,
  • [23] Large-Scale Graph Visualization and Analytics
    Ma, Kwan-Liu
    Muelder, Chris W.
    [J]. COMPUTER, 2013, 46 (07) : 39 - 46
  • [24] Special section on large-scale analytics
    Lehner, Wolfgang
    Franklin, Michael J.
    [J]. VLDB JOURNAL, 2012, 21 (05): : 587 - 588
  • [25] Special section on large-scale analytics
    Wolfgang Lehner
    Michael J. Franklin
    [J]. The VLDB Journal, 2012, 21 : 587 - 588
  • [26] Effective ensemble learning approach for large-scale medical data analytics
    Namamula, Lakshmana Rao
    Chaytor, Daniel
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (01) : 13 - 20
  • [27] Distributed optimization over large-scale systems for big data analytics
    Shahbazian, Reza
    [J]. 4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH, 2021, 19 (02): : 309 - 310
  • [28] Distributed optimization over large-scale systems for big data analytics
    Reza Shahbazian
    [J]. 4OR, 2021, 19 : 309 - 310
  • [29] BANKSAFE: Visual analytics for big data in large-scale computer networks
    Fischer, Fabian
    Fuchs, Johannes
    Mansmann, Florian
    Keim, Daniel A.
    [J]. INFORMATION VISUALIZATION, 2015, 14 (01) : 51 - 61
  • [30] Big Data Analytics for Large-scale Wireless Networks: Challenges and Opportunities
    Dai, Hong-Ning
    Wong, Raymond Chi-Wing
    Wang, Hao
    Zheng, Zibin
    Vasilakos, Athanasios V.
    [J]. ACM COMPUTING SURVEYS, 2019, 52 (05)