clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

被引:10
|
作者
Chen, Jing [1 ]
Fang, Jianbin [1 ]
Liu, Weifeng [2 ]
Tang, Tao [1 ]
Yang, Canqun [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] Norwegian Univ Sci & Technol, Dept Comp Sci, Trondheim, Norway
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2020年 / 108卷
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Matrix factorization; Alternating least squares; Performance; RECOMMENDER; SYSTEMS; MEMORY;
D O I
10.1016/j.future.2018.04.071
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Alternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-cores and many-cores. Existing implementations are limited in either speed or portability. In this paper, we present an efficient and portable ALS solver (clMF) for recommender systems. On one hand, wediagnose the baseline implementation and observe that it lacks of the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique, the fine-grained tiling technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently map it to the underlying hardware. The experimental results show that our implementation performs 2.8x-15.7x faster on an Intel 16-core CPU, 23.9x-87.9x faster on an NVIDIA K20C GPU and 34.6x-97.1x faster on an AMD Fury X GPU than the baseline implementation. On the K20C GPU, our implementation also outperforms cuMF over different latent features ranging from 10 to 100 with various real-world recommendation datasets. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:1192 / 1205
页数:14
相关论文
共 50 条
  • [21] FINE-GRAINED PARALLEL GENETIC ALGORITHMS
    MANDERICK, B
    SPIESSENS, P
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON GENETIC ALGORITHMS, 1989, : 428 - 433
  • [22] Fine-grained parallel boundary elements
    Univ of Hertfordshire, Hatfield, United Kingdom
    Eng Anal Boundary Elem, 1 (13-16):
  • [23] Fine-grained parallel Zuker algorithm accelerator with storage optimization on FPGA
    Xia, Fei
    Dou, Yong
    Xu, Jiaqing
    Zhang, Yang
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2011, 48 (04): : 709 - 719
  • [24] AN ALTERNATING RANK-k NONNEGATIVE LEAST SQUARES FRAMEWORK (ARkNLS) FOR NONNEGATIVE MATRIX FACTORIZATION
    Chu, Delin
    Shi, Weya
    Eswar, Srinivas
    Park, Haesun
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2021, 42 (04) : 1451 - 1479
  • [25] NONNEGATIVE MATRIX FACTORIZATION BASED ON ALTERNATING NONNEGATIVITY CONSTRAINED LEAST SQUARES AND ACTIVE SET METHOD
    Kim, Hyunsoo
    Park, Haesun
    SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2008, 30 (02) : 713 - 730
  • [26] A fine-grained parallel algorithm for the cyclic flexible job shop problem
    Bozejko, Wojciech
    Pempera, Jaroslaw
    Wodecki, Mieczyslaw
    ARCHIVES OF CONTROL SCIENCES, 2017, 27 (02): : 169 - 181
  • [27] The effects of varying population density in a fine-grained parallel genetic algorithm
    Li, XD
    Kirley, M
    CEC'02: PROCEEDINGS OF THE 2002 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2002, : 1709 - 1714
  • [28] Efficient Distributed Matrix Factorization Alternating Least Squares (EDMFALS) for Recommendation Systems Using Spark
    Kumar, R. R. S. Ravi
    Rao, G. Appa
    Anuradha, S.
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2022, 21 (01)
  • [29] Resolution of multicomponent peaks by orthogonal projection approach, positive matrix factorization and alternating least squares
    Frenich, AG
    Galera, MM
    Vidal, JLM
    Massart, DL
    Torres-Lapasió, JR
    De Braekeleer, K
    Wang, JH
    Hopke, PK
    ANALYTICA CHIMICA ACTA, 2000, 411 (1-2) : 145 - 155
  • [30] A Fine-Grained Parallel EMTP Algorithm Compatible to Graphic Processing Units
    Song, Yankan
    Chen, Ying
    Yu, Zhitong
    Huang, Shaowei
    Chen, Laijun
    2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,