Compressed linear algebra for large-scale machine learning

被引:13
|
作者
Elgohary, Ahmed [2 ]
Boehm, Matthias [1 ]
Haas, Peter J. [1 ]
Reiss, Frederick R. [1 ]
Reinwald, Berthold [1 ]
机构
[1] IBM Res Almaden, San Jose, CA 95120 USA
[2] Univ Maryland, College Pk, MD 20742 USA
来源
VLDB JOURNAL | 2018年 / 27卷 / 05期
关键词
Machine learning; Large-scale; Declarative; Linear algebra; Lossless compression; DATABASE; FACTORIZATION; DB2;
D O I
10.1007/s00778-017-0478-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable fast matrix-vector operations on in-memory data. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Therefore, we initiate work-inspired by database compression and sparse matrix formats-on value-based compressed linear algebra (CLA), in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation. We contribute effective column compression schemes, cache-conscious operations, and an efficient sampling-based compression algorithm. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios, which enables fitting substantially larger datasets into available memory. We thereby obtain significant end-to-end performance improvements up to .
引用
收藏
页码:719 / 744
页数:26
相关论文
共 50 条
  • [1] Compressed Linear Algebra for Large-Scale Machine Learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 960 - 971
  • [2] Compressed linear algebra for large-scale machine learning
    Ahmed Elgohary
    Matthias Boehm
    Peter J. Haas
    Frederick R. Reiss
    Berthold Reinwald
    [J]. The VLDB Journal, 2018, 27 : 719 - 744
  • [3] Compressed Linear Algebra for Declarative Large-Scale Machine Learning
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    [J]. COMMUNICATIONS OF THE ACM, 2019, 62 (05) : 83 - 91
  • [4] Scaling Machine Learning via Compressed Linear Algebra
    Elgohary, Ahmed
    Boehm, Matthias
    Haas, Peter J.
    Reiss, Frederick R.
    Reinwald, Berthold
    [J]. SIGMOD RECORD, 2017, 46 (01) : 42 - 49
  • [5] Technical Perspective: Scaling Machine Learning via Compressed Linear Algebra
    Ives, Zachary G.
    [J]. SIGMOD RECORD, 2017, 46 (01) : 41 - 41
  • [6] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [7] Linear algebra software for large-scale accerlerated multicore computing
    Abdelfatah, A.
    Anzt, H.
    Dongarra, J.
    Gates, M.
    Haidar, A.
    Kurzak, J.
    Luszczek, P.
    Tomov, S.
    Yamazaki, I.
    YarKhan, A.
    [J]. ACTA NUMERICA, 2016, 25 : 1 - 160
  • [8] Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics
    Buono, Daniele
    Gunnels, John A.
    Que, Xinyu
    Checconi, Fabio
    Petrini, Fabrizio
    Tuan, Tai-Ching
    Long, Chris
    [J]. COMPUTER, 2015, 48 (08) : 26 - 34
  • [9] Large-scale distributed linear algebra with tensor processing units
    Lewis, Adam G. M.
    Beall, Jackson
    Ganahl, Martin
    Hauru, Markus
    Mallick, Shrestha Basu
    Vidal, Guifre
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2022, 119 (33)
  • [10] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789