Building and Operating a Large-Scale Enterprise Data Analytics Platform

被引:6
|
作者
Bauer, Daniel [1 ]
Froese, Florian [1 ]
Garces-Erice, Luis [1 ]
Giblin, Chris [1 ]
Labbi, Abdel [1 ]
Nagy, Zoltan A. [1 ]
Pardon, Niels [1 ]
Rooney, Sean [1 ]
Urbanetz, Peter [1 ]
Vetsch, Pascal [1 ]
Wespi, Andreas [1 ]
机构
[1] IBM Res Europe, Saumerstr 4, CH-8803 Ruschlikon, Switzerland
关键词
Hybrid cloud; Datalake; Storage; Ingestion; SQL/Hadoop; Governance;
D O I
10.1016/j.bdr.2020.100181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the last three years we have been running a large-scale data processing platform for applying analytics to corporate data at scale on an OpenStack private cloud instance. Our platform makes a wide variety of corporate data assets, such as sales, marketing, customer information, as well as data from less conventional sources such as weather, news and social media available for analytics purposes to hundreds of globally distributed teams across the company. We control every layer in the stack from the processing engines down to the hardware. Here we report our experiences in building and operating such a system. We describe our technical choices and describe how they evolved as we observed the actual workloads created by users. (C) 2020 The Authors. Published by Elsevier Inc.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] HiPerData: An Autonomous Large-Scale Model Building and Management Platform for Big Data Analytics
    Duan, Rubing
    Goh, Rick Siow Mong
    Yang, Feng
    Di Shang, Richard
    Liu, Yong
    Li, Zengxiang
    Wang, Long
    Lu, Sifei
    Yang, Xulei
    Qin, Zheng
    [J]. PROCEEDINGS OF THE 2015 10TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, 2015, : 449 - 454
  • [2] Disco: A Computing Platform for Large-Scale Data Analytics
    Mundkur, Prashanth
    Tuulos, Ville
    Flatow, Jared
    [J]. ERLANG 11: PROCEEDINGS OF THE 2011 ACM SIGPLAN ERLANG WORKSHOP, 2011, : 84 - 89
  • [3] Building a large-scale object-based active storage platform for data analytics in the internet of things
    Xu, Quanqing
    Aung, Khin Mi Mi
    Zhu, Yongqing
    Yong, Khai Leong
    [J]. JOURNAL OF SUPERCOMPUTING, 2016, 72 (07): : 2796 - 2814
  • [4] Building a large-scale object-based active storage platform for data analytics in the internet of things
    Quanqing Xu
    Khin Mi Mi Aung
    Yongqing Zhu
    Khai Leong Yong
    [J]. The Journal of Supercomputing, 2016, 72 : 2796 - 2814
  • [5] Building a Big Data Platform for Large-scale Security Data Analysis
    Lee, Jong-Hoon
    Kim, Young Soo
    Kim, Jong Hyun
    Kim, Ik Kyun
    Han, Ki-Jun
    [J]. 2017 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2017, : 976 - 980
  • [6] A Hybrid Data Model for Large-Scale Analytics
    Feo, John
    [J]. 2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, : 269 - 269
  • [7] Optasia: A Relational Platform for Efficient Large-Scale Video Analytics
    Lu, Yao
    Chowdhery, Aakanksha
    Kandula, Srikanth
    [J]. PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016), 2016, : 57 - 70
  • [8] A Large-Scale Object-Based Active Storage Platform for Data Analytics in the Internet of Things
    Xu, Quanqing
    Aung, Khin Mi Mi
    Zhu, Yongqing
    Yong, Khai Leong
    [J]. ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURE INFORMATION TECHNOLOGY, VOL 2, 2016, 354 : 405 - 413
  • [9] OPAL: High performance platform for large-scale privacy-preserving location data analytics
    Oehmichen, Axel
    Jain, Shubham
    Gadotti, Andrea
    de Montjoye, Yves-Alexandre
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1332 - 1342
  • [10] Visual Analytics of Large-Scale Climate Model Data
    Wong, Pak Chung
    Shen, Han-Wei
    Leung, Ruby
    Hagos, Samson
    Lee, Teng-Yok
    Tong, Xin
    Lu, Kewei
    [J]. 2014 IEEE 4TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2014, : 85 - 92