Fast and Robust Parallel SGD Matrix Factorization

被引:30
|
作者
Oh, Jinoh [1 ]
Han, Wook-Shin [1 ]
Yu, Hwanjo [1 ]
Jiang, Xiaoqian [2 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Pohang, South Korea
[2] UCSD, La Jolla, CA USA
基金
新加坡国家研究基金会;
关键词
Matrix factorization; Stochastic gradient descent;
D O I
10.1145/2783258.2783322
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Matrix factorization is one of the fundamental techniques for analyzing latent relationship between two entities. Especially, it is used for recommendation for its high accuracy. Efficient parallel SGD matrix factorization algorithms have been developed for large matrices to speed up the convergence of factorization. However, most of them are designed for a shared-memory environment thus fail to factorize a large matrix that is too big to fit in memory, and their performances are also unreliable when the matrix is skewed. This paper proposes a fast and robust parallel SGD matrix factorization algorithm, called MLGF-MF, which is robust to skewed matrices and runs efficiently on block-storage devices (e.g., SSD disks) as well as shared-memory. MLGF-MF uses Multi-Level Grid File (MLGF) for partitioning the matrix and minimizes the cost for scheduling parallel SGD updates on the partitioned regions by exploiting partial match queries processing. Thereby, MLGF-MF produces reliable results efficiently even on skewed matrices. MLGF-MF is designed with asynchronous I/O permeated in the algorithm such that CPU keeps executing without waiting for I/O to complete. Thereby, MLGF-MF overlaps the CPU and I/O processing, which eventually offsets the I/O cost and maximizes the CPU utility. Recent flash SSD disks support high performance parallel I/O, thus are appropriate for executing the asynchronous I/O. From our extensive evaluations, MLGF-MF significantly outperforms (or converges faster than) the state-of-the-art algorithms in both shared-memory and block-storage environments. In addition, the outputs of MLGF-MF is significantly more robust to skewed matrices. Our implementation of MLGF-MF is available at http ://dm.postech.ac.kr/MLGF-MF as executable files.
引用
收藏
页码:865 / 874
页数:10
相关论文
共 50 条
  • [1] Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures
    Nishioka, Yusuke
    Taura, Kenjiro
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1178 - 1184
  • [2] Anonymization Technique Based on SGD Matrix Factorization
    Mimoto, Tomoaki
    Hidano, Seira
    Kiyomoto, Shinsaku
    Miyaji, Atsuko
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (02) : 299 - 308
  • [3] Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization
    Gillis, Nicolas
    Vavasis, Stephen A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (04) : 698 - 714
  • [4] A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems
    Chin, Wei-Sheng
    Zhuang, Yong
    Juan, Yu-Chin
    Lin, Chih-Jen
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 6 (01)
  • [5] Simplicial Nonnegative Matrix Tri-factorization: Fast Guaranteed Parallel Algorithm
    Nguyen, Duy-Khuong
    Quoc Tran-Dinh
    Ho, Tu-Bao
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT II, 2016, 9948 : 117 - 125
  • [6] FAST PARALLEL LYNDON FACTORIZATION WITH APPLICATIONS
    APOSTOLICO, A
    CROCHEMORE, M
    MATHEMATICAL SYSTEMS THEORY, 1995, 28 (02): : 89 - 108
  • [7] CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs
    Xie, Xiaolong
    Tan, Wei
    Fong, Liana L.
    Liang, Yun
    HPDC'17: PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2017, : 79 - 92
  • [8] Parallel Matrix Factorization for Binary Response
    Khanna, Rajiv
    Zhang, Liang
    Agarwal, Deepak
    Chen, Bee-chung
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [9] Parallel matrix factorization for recommender systems
    Hsiang-Fu Yu
    Cho-Jui Hsieh
    Si Si
    Inderjit S. Dhillon
    Knowledge and Information Systems, 2014, 41 : 793 - 819
  • [10] Parallel matrix factorization for recommender systems
    Yu, Hsiang-Fu
    Hsieh, Cho-Jui
    Si, Si
    Dhillon, Inderjit S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 41 (03) : 793 - 819