clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

被引：10

作者：

Chen, Jing ^{[1
]}

Fang, Jianbin ^{[1
]}

Liu, Weifeng ^{[2
]}

Tang, Tao ^{[1
]}

Yang, Canqun ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China

[2] Norwegian Univ Sci & Technol, Dept Comp Sci, Trondheim, Norway

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2020年 / 108卷

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Matrix factorization; Alternating least squares; Performance; RECOMMENDER; SYSTEMS; MEMORY;

D O I：

10.1016/j.future.2018.04.071

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Alternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-cores and many-cores. Existing implementations are limited in either speed or portability. In this paper, we present an efficient and portable ALS solver (clMF) for recommender systems. On one hand, wediagnose the baseline implementation and observe that it lacks of the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique, the fine-grained tiling technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently map it to the underlying hardware. The experimental results show that our implementation performs 2.8x-15.7x faster on an Intel 16-core CPU, 23.9x-87.9x faster on an NVIDIA K20C GPU and 34.6x-97.1x faster on an AMD Fury X GPU than the baseline implementation. On the K20C GPU, our implementation also outperforms cuMF over different latent features ranging from 10 to 100 with various real-world recommendation datasets. (C) 2018 Elsevier B.V. All rights reserved.

引用

页码：1192 / 1205

页数：14

共 50 条

[11] Fine-grained parallel algorithm for unstructured surface mesh generation
Zhao, Dawei
Chen, Jianjun
Zheng, Yao
Huang, Zhengge
Zheng, Jianjing
COMPUTERS & STRUCTURES, 2015, 154 : 177 - 191
[12] Solving non-negative matrix factorization by alternating least squares with a modified strategy
Hongwei Liu
Xiangli Li
Xiuyun Zheng
Data Mining and Knowledge Discovery, 2013, 26 : 435 - 451
[13] Solving non-negative matrix factorization by alternating least squares with a modified strategy
Liu, Hongwei
Li, Xiangli
Zheng, Xiuyun
DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (03) : 435 - 451
[14] Alternating Iteratively Reweighted Least Squares Minimization for Low-Rank Matrix Factorization
Giampouras, Paris V.
Rontogiannis, Athanasios A.
Koutroumbas, Konstantinos D.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (02) : 490 - 503
[15] Regularized alternating least squares algorithms for non-negative matrix/tensor factorization
Cichocki, Andrzej
Zdunek, Rafal
ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 3, PROCEEDINGS, 2007, 4493 : 793 - +
[16] Training Streaming Factorization Machines with Alternating Least Squares
Mao, Xueyu
Mitra, Saayan
Li, Sheng
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1185 - 1188
[17] Novel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations
Anh Huy Phan
Cichocki, Andrzej
Zdunek, Rafal
Thanh Vu Dinh
NEURAL INFORMATION PROCESSING: THEORY AND ALGORITHMS, PT I, 2010, 6443 : 262 - +
[18] Fine-Grained Bipartite Concept Factorization for Clustering
Peng, Chong
Zhang, Pengfei
Chen, Yongyong
Kang, Zhao
Chen, Chenglizhao
Cheng, Qiang
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26254 - 26264
[19] Defect-tolerant, fine-grained parallel testing of a Cell Matrix
Durbeck, LJK
Macias, NJ
RECONFIGURABLE TECHNOLOGY: FPGAS AND RECONFIGURABLE PROCESSORS FOR COMPUTING AND COMMUNICATIONS IV, 2002, 4867 : 71 - 85
[20] Fine-grained parallel boundary elements
Davies, AJ
ENGINEERING ANALYSIS WITH BOUNDARY ELEMENTS, 1997, 19 (01) : 13 - 16

← 1 2 3 4 5 →