Matrix Factorization on GPUs with Memory Optimization and Approximate Computing

被引:2
|
作者
Tan, Wei [1 ,2 ]
Chang, Shiyu [2 ]
Fong, Liana [2 ]
Li, Cheng [3 ]
Wang, Zijun [2 ]
Cao, LiangLiang [2 ,4 ]
机构
[1] Citadel, Chicago, IL 60603 USA
[2] IBM Res, Yorktown Hts, NY USA
[3] Univ Illinois, Urbana, IL USA
[4] HelloVera AI, New York, NY USA
关键词
Matrix factorization; machine learning; parallelization; GPU; CUDA;
D O I
10.1145/3225058.3225096
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Matrix factorization (MF) discovers latent features from observations, which has shown great promises in the fields of collaborative filtering, data compression, feature extraction, word embedding, etc. While many problem-specific optimization techniques have been proposed, alternating least square (ALS) remains popular due to its general applicability (e.g. easy to handle positive-unlabeled inputs), fast convergence and parallelization capability. Current MF implementations are either optimized for a single machine or with a need of a large computer cluster but still are insufficent. This is because a single machine provides limited compute power for large-scale data while multiple machines suffer from the network communication bottleneck. To address the aforementioned challenge, accelerating ALS on garphics processing units (GPUs) is a promising direction. We propose the novel approach in enhancing the MF efficiency via both memory optimization and approximate computing. The former exploits GPU memory hierarchy to increase data reuse, while the later reduces unneccessary computing without hurting the convergence of learning algorithms. Extensive experiments on large-scale datasets show that our solution not only outperforms the competing CPU solutions by a large margin but also has a 2x-4x performance gain compared to the state-of-the-art GPU solutions. Our implementations are open-sourced and publicly available.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Associative Memristive Memory for Approximate Computing in GPUs
    Ghofrani, Amirali
    Rahimi, Abbas
    Lastras-Montano, Miguel A.
    Benini, Luca
    Gupta, Rajesh K.
    Cheng, Kwang-Ting
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2016, 6 (02) : 222 - 234
  • [2] Acceleration of Approximate Matrix Multiplications on GPUs
    Okuyama, Takuya
    Rohm, Andre
    Mihana, Takatomo
    Naruse, Makoto
    [J]. ENTROPY, 2023, 25 (08)
  • [3] A Memory-Access-Efficient Implementation for Computing the Approximate String Matching Algorithm on GPUs
    Nunes, Lucas Saad Nogueira
    Bordim, Jacir Luiz
    Ito, Yasuaki
    Nakano, Koji
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12) : 2995 - 3003
  • [4] Work-in-Progress: Accelerated Matrix Factorization by Approximate Computing for Recommendation System
    Wu, Yining
    Sai, Gaole
    Duan, Shengyu
    [J]. 2022 INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE (EMSOFT 2022), 2022, : 13 - 14
  • [5] Progressive Optimization of Batched LU Factorization on GPUs
    Abdelfattah, Ahmad
    Tomov, Stanimire
    Dongarra, Jack
    [J]. 2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [6] On Approximate Spectral Factorization of Matrix Functions
    Ephremidze, Lasha
    Janashia, Gigla
    Lagvilava, Edem
    [J]. JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, 2011, 17 (05) : 976 - 990
  • [7] Algorithms for approximate subtropical matrix factorization
    Sanjar Karaev
    Pauli Miettinen
    [J]. Data Mining and Knowledge Discovery, 2019, 33 : 526 - 576
  • [8] Algorithms for approximate subtropical matrix factorization
    Karaev, Sanjar
    Miettinen, Pauli
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (02) : 526 - 576
  • [9] On Approximate Spectral Factorization of Matrix Functions
    Lasha Ephremidze
    Gigla Janashia
    Edem Lagvilava
    [J]. Journal of Fourier Analysis and Applications, 2011, 17 : 976 - 990
  • [10] Batch QR Factorization on GPUs: Design, Optimization, and Tuning
    Abdelfattah, Ahmad
    Tomov, Stan
    Dongarra, Jack
    [J]. COMPUTATIONAL SCIENCE - ICCS 2022, PT I, 2022, : 60 - 74