ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation

被引:28
|
作者
Kara, Kaan [1 ]
Eguro, Ken [2 ]
Zhang, Ce [1 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Syst Grp, Zurich, Switzerland
[2] Microsoft Res, Redmond, WA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 12卷 / 04期
关键词
REAL-TIME; SCALE;
D O I
10.14778/3297753.3297756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to perform machine learning (ML) tasks in a database management system (DBMS) provides the data analyst with a powerful tool. Unfortunately, integration of ML into a DBMS is challenging for reasons varying from differences in execution model to data layout requirements. In this paper, we assume a column-store main-memory DBMS, optimized for online analytical processing, as our initial system. On this system, we explore the integration of coordinate-descent based methods working natively on columnar format to train generalized linear models. We use a cache-efficient, partitioned stochastic coordinate descent algorithm providing linear throughput scalability with the number of cores while preserving convergence quality, up to 14 cores in our experiments. Existing column oriented DBMS rely on compression and even encryption to store data in memory. When those features are considered, the performance of a CPU based solution suffers. Thus, in the paper we also show how to exploit hardware acceleration as part of a hybrid CPU+FPGA system to provide on-the-fly data transformation combined with an FPGA-based coordinate-descent engine. The resulting system is a column-store DBMS with its important features preserved (e.g., data compression) that offers high performance machine learning capabilities.
引用
收藏
页码:348 / 361
页数:14
相关论文
共 50 条
  • [1] A spatial column-store to triangulate The Netherlands on the fly
    Goncalves, Romulo
    van Tilburg, Tom
    Kyzirakos, Kostis
    Alvanaki, Foteini
    Koutsourakis, Panagiotis
    van Werkhoven, Ben
    van Hage, Willem
    24TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2016), 2016,
  • [2] A data reusing strategy in column-store data warehouse
    Wang, M. (wangmei@dhu.edu.cn), 1626, Science Press (36):
  • [3] Heuristic mechanism for query optimization in column-store data warehouse
    Yan Q.-L.
    Sun L.
    Wang M.
    Le J.-J.
    Liu G.-H.
    Jisuanji Xuebao/Chinese Journal of Computers, 2011, 34 (10): : 2018 - 2026
  • [4] On-the-fly Data Transformation in Action
    Mun, Ju Hyoung
    Karatsenidis, Konstantinos
    Papon, Tarikul Islam
    Roozkhosh, Shahin
    Hoornaert, Denis
    Drepper, Ulrich
    Sanaullah, Ahmed
    Mancuso, Renato
    Athanassoulis, Manos
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3950 - 3953
  • [5] Materialization strategies in big data analysis system based on column-store
    Zhang, Bin
    Le, Jiajin
    Sun, Li
    Xia, Xiaoling
    Wang, Mei
    Li, Yefeng
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (05): : 1061 - 1070
  • [6] Column-Store Support for RDF Data Management: not all swans are white
    Sidirourgos, Lefteris
    Goncalves, Romulo
    Kersten, Martin
    Nes, Niels
    Manegold, Stefan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1553 - 1563
  • [7] Workload-Driven Placement of Column-Store Data Structures on DRAM and NVM
    Lasch, Robert
    Schulze, Robert
    Legler, Thomas
    Sattler, Kai-Uwe
    17TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE, DAMON 2021, 2021,
  • [8] OB-Tree: Accelerating Data Cleaning in Out-of-Core Column-Store Databases
    Yu, Feng
    Latronics, Brandon J.
    Matacic, Tyler
    Jones, Eric S.
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 185 - 192
  • [9] On-the-Fly Machine Learning for Improving Image Resolution in Tomography
    Hendriksen, Allard A.
    Pelt, Daniel M.
    Palenstijn, Willem Jan
    Coban, Sophia B.
    Batenburg, Kees Joost
    APPLIED SCIENCES-BASEL, 2019, 9 (12):
  • [10] On-the-Fly Optimization of Synchrotron Beamlines Using Machine Learning
    Morris, T. W.
    Rakitin, M.
    Giles, A.
    Lynch, J.
    Walter, A. L.
    Nash, B.
    Abell, D.
    Moeller, P.
    Pogorelov, I
    Goldring, N.
    OPTICAL SYSTEM ALIGNMENT, TOLERANCING, AND VERIFICATION XIV, 2022, 12222