ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation

被引:28
|
作者
Kara, Kaan [1 ]
Eguro, Ken [2 ]
Zhang, Ce [1 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Syst Grp, Zurich, Switzerland
[2] Microsoft Res, Redmond, WA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 12卷 / 04期
关键词
REAL-TIME; SCALE;
D O I
10.14778/3297753.3297756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to perform machine learning (ML) tasks in a database management system (DBMS) provides the data analyst with a powerful tool. Unfortunately, integration of ML into a DBMS is challenging for reasons varying from differences in execution model to data layout requirements. In this paper, we assume a column-store main-memory DBMS, optimized for online analytical processing, as our initial system. On this system, we explore the integration of coordinate-descent based methods working natively on columnar format to train generalized linear models. We use a cache-efficient, partitioned stochastic coordinate descent algorithm providing linear throughput scalability with the number of cores while preserving convergence quality, up to 14 cores in our experiments. Existing column oriented DBMS rely on compression and even encryption to store data in memory. When those features are considered, the performance of a CPU based solution suffers. Thus, in the paper we also show how to exploit hardware acceleration as part of a hybrid CPU+FPGA system to provide on-the-fly data transformation combined with an FPGA-based coordinate-descent engine. The resulting system is a column-store DBMS with its important features preserved (e.g., data compression) that offers high performance machine learning capabilities.
引用
收藏
页码:348 / 361
页数:14
相关论文
共 50 条
  • [41] On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets
    Aaron Gilad Kusne
    Tieren Gao
    Apurva Mehta
    Liqin Ke
    Manh Cuong Nguyen
    Kai-Ming Ho
    Vladimir Antropov
    Cai-Zhuang Wang
    Matthew J. Kramer
    Christian Long
    Ichiro Takeuchi
    Scientific Reports, 4
  • [42] Realistic On-the-fly Outcomes of Planetary Collisions. II. Bringing Machine Learning to N-body Simulations
    Emsenhuber, Alexandre
    Cambioni, Saverio
    Asphaug, Erik
    Gabriel, Travis S. J.
    Schwartz, Stephen R.
    Furfaro, Roberto
    ASTROPHYSICAL JOURNAL, 2020, 891 (01):
  • [43] Programming-by-Example for Data Transformation to Improve Machine Learning Performance
    Narita, Minori
    Igarashi, Takeo
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES: COMPANION (IUI 2019), 2019, : 113 - 114
  • [44] Impact of Big Data and Machine Learning on Digital Transformation in Marketing: A Literature Review
    Miklosik, Andrej
    Evans, Nina
    IEEE ACCESS, 2020, 8 : 101284 - 101292
  • [45] Into the Noddyverse: a massive data store of 3D geological models for machine learning and inversion applications
    Jessell, Mark
    Guo, Jiateng
    Li, Yunqiang
    Lindsay, Mark
    Scalzo, Richard
    Giraud, Jeremie
    Pirot, Guillaume
    Cripps, Ed
    Ogarko, Vitaliy
    EARTH SYSTEM SCIENCE DATA, 2022, 14 (01) : 381 - 392
  • [46] Modelling In-Store Consumer Behaviour Using Machine Learning and Digital Signage Audience Measurement Data
    Ravnik, Robert
    Solina, Franc
    Zabkar, Vesna
    VIDEO ANALYTICS FOR AUDIENCE MEASUREMENT, 2014, 8811 : 123 - 133
  • [47] Modelling In-Store Consumer Behaviour Using Machine Learning and Digital Signage Audience Measurement Data
    Ravnik, Robert
    Solina, Franc
    Zabkar, Vesna
    Ravnik, Robert (robert.ravnik@fri.uni-lj.si), 1600, Springer Verlag (8811): : 123 - 133
  • [48] Exploring Librational Pathways with on-the-Fly Machine-Learning Force Fields: Methylammonium Molecules in MAPbX3 (X = I, Br, Cl) Perovskites
    Bokdam, Menno
    Lahnsteiner, Jonathan
    Sarma, D. D.
    JOURNAL OF PHYSICAL CHEMISTRY C, 2021, 125 (38): : 21077 - 21086
  • [49] Short Term Electric Load Forecasting Based on Data Transformation and Statistical Machine Learning
    Andriopoulos, Nikos
    Magklaras, Aristeidis
    Birbas, Alexios
    Papalexopoulos, Alex
    Valouxis, Christos
    Daskalaki, Sophia
    Birbas, Michael
    Housos, Efthymios
    Papaioannou, George P.
    APPLIED SCIENCES-BASEL, 2021, 11 (01): : 1 - 22
  • [50] Analysis of the Geometrical Evolution in On-the-Fly Surface-Hopping Nonadiabatic Dynamics with Machine Learning Dimensionality Reduction Approaches: Classical Multidimensional Scaling and Isometric Feature Mapping
    Li, Xusong
    Xie, Yu
    Hu, Deping
    Lan, Zhenggang
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2017, 13 (10) : 4611 - 4623