Building and Operating a Large-Scale Enterprise Data Analytics Platform

被引:6
|
作者
Bauer, Daniel [1 ]
Froese, Florian [1 ]
Garces-Erice, Luis [1 ]
Giblin, Chris [1 ]
Labbi, Abdel [1 ]
Nagy, Zoltan A. [1 ]
Pardon, Niels [1 ]
Rooney, Sean [1 ]
Urbanetz, Peter [1 ]
Vetsch, Pascal [1 ]
Wespi, Andreas [1 ]
机构
[1] IBM Res Europe, Saumerstr 4, CH-8803 Ruschlikon, Switzerland
关键词
Hybrid cloud; Datalake; Storage; Ingestion; SQL/Hadoop; Governance;
D O I
10.1016/j.bdr.2020.100181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the last three years we have been running a large-scale data processing platform for applying analytics to corporate data at scale on an OpenStack private cloud instance. Our platform makes a wide variety of corporate data assets, such as sales, marketing, customer information, as well as data from less conventional sources such as weather, news and social media available for analytics purposes to hundreds of globally distributed teams across the company. We control every layer in the stack from the processing engines down to the hardware. Here we report our experiences in building and operating such a system. We describe our technical choices and describe how they evolved as we observed the actual workloads created by users. (C) 2020 The Authors. Published by Elsevier Inc.
引用
下载
收藏
页数:20
相关论文
共 50 条
  • [31] TerraBrasilis: A Spatial Data Analytics Infrastructure for Large-Scale Thematic Mapping
    Assis, Luiz Fernando F. G.
    Ferreira, Karine Reis
    Vinhas, Lubia
    Maurano, Luis
    Almeida, Claudio
    Carvalho, Andre
    Rodrigues, Jether
    Maciel, Adeline
    Camargo, Claudinei
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (11)
  • [32] BANKSAFE: Visual analytics for big data in large-scale computer networks
    Fischer, Fabian
    Fuchs, Johannes
    Mansmann, Florian
    Keim, Daniel A.
    INFORMATION VISUALIZATION, 2015, 14 (01) : 51 - 61
  • [33] Visual Analytics to make sense of large-scale administrative and normative data
    Guarino, Alfonso
    Lettieri, Nicola
    Malandrino, Delfina
    Russo, Pietro
    Zaccagnino, Rocco
    2019 23RD INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV): BIOMEDICAL VISUALIZATION AND GEOMETRIC MODELLING & IMAGING, 2019, : 133 - 138
  • [34] Effective ensemble learning approach for large-scale medical data analytics
    Lakshmana Rao Namamula
    Daniel Chaytor
    International Journal of System Assurance Engineering and Management, 2024, 15 : 13 - 20
  • [35] Going Digital: A Survey on Digitalization and Large-Scale Data Analytics in Healthcare
    Tresp, Volker
    Overhage, J. Marc
    Bundschus, Markus
    Rabizadeh, Shahrooz
    Fasching, Peter A.
    Yu Shipeng
    PROCEEDINGS OF THE IEEE, 2016, 104 (11) : 2180 - 2206
  • [36] A Novel Visual analytics Approach for Clustering Large-Scale Social Data
    Wang, Zhangye
    Zhou, Juanxia
    Chen, Wei
    Chen, Chang
    Liao, Jiyuan
    Maciejewski, Ross
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [37] Efficient Graph Analytics in Python']Python for Large-Scale Data Science
    Zhou, Xiantian
    Ordonez, Carlos
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2021), 2021, 12925 : 158 - 164
  • [38] Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications
    Bicer, Tekin
    Yin, Jian
    Chiu, David
    Agrawal, Gagan
    Schuchardt, Karen
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 1205 - 1216
  • [39] Large-scale simulation platform
    Institute of Cybernetics, Tallinn Technical University, Akadeemia tee 21, 12618 Tallinn, Estonia
    WSEAS Trans. Comput., 2007, 1 (65-71):
  • [40] An Open Transportation Network Resilience Analytics Platform for Large-Scale Urban Accessibility Analysis
    Castro, Edgar
    Wang, Qi
    Akhavan, Armin
    CONSTRUCTION RESEARCH CONGRESS 2018: INFRASTRUCTURE AND FACILITY MANAGEMENT, 2018, : 213 - 221