clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

被引:10
|
作者
Chen, Jing [1 ]
Fang, Jianbin [1 ]
Liu, Weifeng [2 ]
Tang, Tao [1 ]
Yang, Canqun [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] Norwegian Univ Sci & Technol, Dept Comp Sci, Trondheim, Norway
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2020年 / 108卷
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Matrix factorization; Alternating least squares; Performance; RECOMMENDER; SYSTEMS; MEMORY;
D O I
10.1016/j.future.2018.04.071
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Alternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-cores and many-cores. Existing implementations are limited in either speed or portability. In this paper, we present an efficient and portable ALS solver (clMF) for recommender systems. On one hand, wediagnose the baseline implementation and observe that it lacks of the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique, the fine-grained tiling technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently map it to the underlying hardware. The experimental results show that our implementation performs 2.8x-15.7x faster on an Intel 16-core CPU, 23.9x-87.9x faster on an NVIDIA K20C GPU and 34.6x-97.1x faster on an AMD Fury X GPU than the baseline implementation. On the K20C GPU, our implementation also outperforms cuMF over different latent features ranging from 10 to 100 with various real-world recommendation datasets. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:1192 / 1205
页数:14
相关论文
共 50 条
  • [11] Fine-grained parallel algorithm for unstructured surface mesh generation
    Zhao, Dawei
    Chen, Jianjun
    Zheng, Yao
    Huang, Zhengge
    Zheng, Jianjing
    COMPUTERS & STRUCTURES, 2015, 154 : 177 - 191
  • [12] Solving non-negative matrix factorization by alternating least squares with a modified strategy
    Hongwei Liu
    Xiangli Li
    Xiuyun Zheng
    Data Mining and Knowledge Discovery, 2013, 26 : 435 - 451
  • [13] Solving non-negative matrix factorization by alternating least squares with a modified strategy
    Liu, Hongwei
    Li, Xiangli
    Zheng, Xiuyun
    DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (03) : 435 - 451
  • [14] Alternating Iteratively Reweighted Least Squares Minimization for Low-Rank Matrix Factorization
    Giampouras, Paris V.
    Rontogiannis, Athanasios A.
    Koutroumbas, Konstantinos D.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (02) : 490 - 503
  • [15] Regularized alternating least squares algorithms for non-negative matrix/tensor factorization
    Cichocki, Andrzej
    Zdunek, Rafal
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 3, PROCEEDINGS, 2007, 4493 : 793 - +
  • [16] Training Streaming Factorization Machines with Alternating Least Squares
    Mao, Xueyu
    Mitra, Saayan
    Li, Sheng
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1185 - 1188
  • [17] Novel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations
    Anh Huy Phan
    Cichocki, Andrzej
    Zdunek, Rafal
    Thanh Vu Dinh
    NEURAL INFORMATION PROCESSING: THEORY AND ALGORITHMS, PT I, 2010, 6443 : 262 - +
  • [18] Fine-Grained Bipartite Concept Factorization for Clustering
    Peng, Chong
    Zhang, Pengfei
    Chen, Yongyong
    Kang, Zhao
    Chen, Chenglizhao
    Cheng, Qiang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26254 - 26264
  • [19] Defect-tolerant, fine-grained parallel testing of a Cell Matrix
    Durbeck, LJK
    Macias, NJ
    RECONFIGURABLE TECHNOLOGY: FPGAS AND RECONFIGURABLE PROCESSORS FOR COMPUTING AND COMMUNICATIONS IV, 2002, 4867 : 71 - 85
  • [20] Fine-grained parallel boundary elements
    Davies, AJ
    ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 1997, 19 (01) : 13 - 16