Building and Operating a Large-Scale Enterprise Data Analytics Platform

被引:6
|
作者
Bauer, Daniel [1 ]
Froese, Florian [1 ]
Garces-Erice, Luis [1 ]
Giblin, Chris [1 ]
Labbi, Abdel [1 ]
Nagy, Zoltan A. [1 ]
Pardon, Niels [1 ]
Rooney, Sean [1 ]
Urbanetz, Peter [1 ]
Vetsch, Pascal [1 ]
Wespi, Andreas [1 ]
机构
[1] IBM Res Europe, Saumerstr 4, CH-8803 Ruschlikon, Switzerland
关键词
Hybrid cloud; Datalake; Storage; Ingestion; SQL/Hadoop; Governance;
D O I
10.1016/j.bdr.2020.100181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the last three years we have been running a large-scale data processing platform for applying analytics to corporate data at scale on an OpenStack private cloud instance. Our platform makes a wide variety of corporate data assets, such as sales, marketing, customer information, as well as data from less conventional sources such as weather, news and social media available for analytics purposes to hundreds of globally distributed teams across the company. We control every layer in the stack from the processing engines down to the hardware. Here we report our experiences in building and operating such a system. We describe our technical choices and describe how they evolved as we observed the actual workloads created by users. (C) 2020 The Authors. Published by Elsevier Inc.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] A Multimodal Analytics Platform for Journalists Analyzing Large-Scale, Heterogeneous Multilingual, and Multimedia Content
    Vrochidis, Stefanos
    Moumtzidou, Anastasia
    Gialampoukidis, Ilias
    Liparas, Dimitris
    Casamayor, Gerard
    Wanner, Leo
    Heise, Nicolaus
    Wagner, Tilman
    Bilous, Andriy
    Jamin, Emmanuel
    Simeonov, Boyan
    Alexiev, Vladimir
    Busch, Reinhard
    Arapakis, Ioannis
    Kompatsiaris, Ioannis
    [J]. FRONTIERS IN ROBOTICS AND AI, 2018, 5
  • [42] Large-scale data processing platform for laser absorption tomography
    Zhou, Minqiu
    Zhang, Rui
    Chen, Yuan
    Fu, Yalei
    Xia, Jiangnan
    Upadhyay, Abhishek
    Liu, Chang
    [J]. MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (12)
  • [43] Parallel Approach and Platform for Large-scale Web Data Extraction
    Shen, Yi
    Shi, Shengsheng
    Wang, Haitao
    Wei, Wu
    Yuan, Chunfeng
    Huang, Yihua
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2013, : 192 - 196
  • [44] RT-DAP: A Real-Time Data Analytics Platform for Large-scale Industrial Process Monitoring and Control
    Han, Song
    Gong, Tao
    Nixon, Mark
    Rotvold, Eric
    Lam, Kam-Yiu
    Ramamritham, Krithi
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INTERNET (ICII 2018), 2018, : 59 - 68
  • [45] THE FUTURE ROLE OF LARGE-SCALE ENTERPRISE
    Yntema, Theodore O.
    [J]. JOURNAL OF POLITICAL ECONOMY, 1941, 49 (06) : 833 - 848
  • [46] FINANCIAL CONTROL OF LARGE-SCALE ENTERPRISE
    Bell, James Washington
    [J]. AMERICAN ECONOMIC REVIEW, 1939, 29 (01): : 109 - 117
  • [47] SYSTEMIC MANAGEMENT OF LARGE-SCALE ENTERPRISE
    BLOTLEFEVRE, E
    [J]. DIRECTION ET GESTION, 1977, 13 (06): : 19 - 28
  • [48] COMPUTERS IN A LARGE-SCALE FARMING ENTERPRISE
    HAYES, RF
    [J]. VETERINARY RECORD, 1977, 101 (11) : 211 - 211
  • [49] Efficient Large-scale Medical Data (eHealth Big Data) Analytics in Internet of Things
    Plageras, Andreas P.
    Stergiou, Christos
    Kokkonis, George
    Psannis, Kostas E.
    Ishibashi, Yutaka
    Kim, Byung-Gyu
    Gupta, B. Brij
    [J]. 2017 IEEE 19TH CONFERENCE ON BUSINESS INFORMATICS (CBI), VOL 2, 2017, 2 : 21 - 27
  • [50] Too Big to Mail: On the Way to Publish Large-scale Mobile Analytics Data
    Peltonen, Ella
    Lagerspetz, Eemil
    Nurmi, Petteri
    Tarkoma, Sasu
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2374 - 2377