Building and Operating a Large-Scale Enterprise Data Analytics Platform

被引:6
|
作者
Bauer, Daniel [1 ]
Froese, Florian [1 ]
Garces-Erice, Luis [1 ]
Giblin, Chris [1 ]
Labbi, Abdel [1 ]
Nagy, Zoltan A. [1 ]
Pardon, Niels [1 ]
Rooney, Sean [1 ]
Urbanetz, Peter [1 ]
Vetsch, Pascal [1 ]
Wespi, Andreas [1 ]
机构
[1] IBM Res Europe, Saumerstr 4, CH-8803 Ruschlikon, Switzerland
关键词
Hybrid cloud; Datalake; Storage; Ingestion; SQL/Hadoop; Governance;
D O I
10.1016/j.bdr.2020.100181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the last three years we have been running a large-scale data processing platform for applying analytics to corporate data at scale on an OpenStack private cloud instance. Our platform makes a wide variety of corporate data assets, such as sales, marketing, customer information, as well as data from less conventional sources such as weather, news and social media available for analytics purposes to hundreds of globally distributed teams across the company. We control every layer in the stack from the processing engines down to the hardware. Here we report our experiences in building and operating such a system. We describe our technical choices and describe how they evolved as we observed the actual workloads created by users. (C) 2020 The Authors. Published by Elsevier Inc.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Evolving large-scale data stream analytics based on scalable PANFIS
    Za'in, Choiru
    Pratama, Mahardhika
    Pardede, Eric
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 166 : 186 - 197
  • [32] TerraBrasilis: A Spatial Data Analytics Infrastructure for Large-Scale Thematic Mapping
    Assis, Luiz Fernando F. G.
    Ferreira, Karine Reis
    Vinhas, Lubia
    Maurano, Luis
    Almeida, Claudio
    Carvalho, Andre
    Rodrigues, Jether
    Maciel, Adeline
    Camargo, Claudinei
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (11)
  • [33] Effective ensemble learning approach for large-scale medical data analytics
    Lakshmana Rao Namamula
    Daniel Chaytor
    [J]. International Journal of System Assurance Engineering and Management, 2024, 15 : 13 - 20
  • [34] Going Digital: A Survey on Digitalization and Large-Scale Data Analytics in Healthcare
    Tresp, Volker
    Overhage, J. Marc
    Bundschus, Markus
    Rabizadeh, Shahrooz
    Fasching, Peter A.
    Yu Shipeng
    [J]. PROCEEDINGS OF THE IEEE, 2016, 104 (11) : 2180 - 2206
  • [35] Visual Analytics to make sense of large-scale administrative and normative data
    Guarino, Alfonso
    Lettieri, Nicola
    Malandrino, Delfina
    Russo, Pietro
    Zaccagnino, Rocco
    [J]. 2019 23RD INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV): BIOMEDICAL VISUALIZATION AND GEOMETRIC MODELLING & IMAGING, 2019, : 133 - 138
  • [36] Efficient Graph Analytics in Python']Python for Large-Scale Data Science
    Zhou, Xiantian
    Ordonez, Carlos
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2021), 2021, 12925 : 158 - 164
  • [37] A Novel Visual analytics Approach for Clustering Large-Scale Social Data
    Wang, Zhangye
    Zhou, Juanxia
    Chen, Wei
    Chen, Chang
    Liao, Jiyuan
    Maciejewski, Ross
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [38] Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications
    Bicer, Tekin
    Yin, Jian
    Chiu, David
    Agrawal, Gagan
    Schuchardt, Karen
    [J]. IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 1205 - 1216
  • [39] An Open Transportation Network Resilience Analytics Platform for Large-Scale Urban Accessibility Analysis
    Castro, Edgar
    Wang, Qi
    Akhavan, Armin
    [J]. CONSTRUCTION RESEARCH CONGRESS 2018: INFRASTRUCTURE AND FACILITY MANAGEMENT, 2018, : 213 - 221
  • [40] A Multimodal Analytics Platform for Journalists Analyzing Large-Scale, Heterogeneous Multilingual, and Multimedia Content
    Vrochidis, Stefanos
    Moumtzidou, Anastasia
    Gialampoukidis, Ilias
    Liparas, Dimitris
    Casamayor, Gerard
    Wanner, Leo
    Heise, Nicolaus
    Wagner, Tilman
    Bilous, Andriy
    Jamin, Emmanuel
    Simeonov, Boyan
    Alexiev, Vladimir
    Busch, Reinhard
    Arapakis, Ioannis
    Kompatsiaris, Ioannis
    [J]. FRONTIERS IN ROBOTICS AND AI, 2018, 5