Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion

被引：0

作者：

Faraz Makari

Christina Teflioudi

Rainer Gemulla

Peter Haas

Yannis Sismanis

机构：

[1] Max Planck Institute for Computer Science,

[2] IBM Almaden Research Center,undefined

[3] Google,undefined

来源：

Knowledge and Information Systems | 2015年 / 42卷

关键词：

Parallel and distributed matrix completion; Low-rank matrix factorization; Stochastic gradient descent; Recommender systems;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We provide parallel algorithms for large-scale matrix completion on problems with millions of rows, millions of columns, and billions of revealed entries. We focus on in-memory algorithms that run either in a shared-memory environment on a powerful compute node or in a shared-nothing environment on a small cluster of commodity nodes; even very large problems can be handled effectively in these settings. Our ASGD, DSGD-MR, DSGD++, and CSGD algorithms are novel variants of the popular stochastic gradient descent (SGD) algorithm, with the latter three algorithms based on a new “stratified SGD” approach. All of the algorithms are cache-friendly and exploit thread-level parallelism, in-memory processing, and asynchronous communication. We investigate the performance of both new and existing algorithms via a theoretical complexity analysis and a set of large-scale experiments. The results show that CSGD is more scalable, and up to 60 % faster, than the best-performing alternative method in the shared-memory setting. DSGD++ is superior in terms of overall runtime, memory consumption, and scalability in the shared-nothing setting. For example, DSGD++ can solve a difficult matrix completion problem on a high-variance matrix with 10M rows, 1M columns, and 10B revealed entries in around 40 min on 16 compute nodes. In general, algorithms based on SGD appear to perform better than algorithms based on alternating minimizations, such as the PALS and DALS alternating least-squares algorithms.

引用

页码：493 / 523

页数：30

共 50 条

[21] VQ compression algorithms on a shared-memory multiprocessor system
Wakatani, Akiyoshi
[J]. DCC 2006: Data Compression Conference, Proceedings, 2006, : 470 - 470
[22] THE JOIN ALGORITHMS ON A SHARED-MEMORY MULTIPROCESSOR DATABASE MACHINE
QADAH, GZ
IRANI, KB
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1988, 14 (11) : 1668 - 1683
[23] QR FACTORIZATION OF A DENSE MATRIX ON A SHARED-MEMORY MULTIPROCESSOR
CHU, E
GEORGE, A
[J]. PARALLEL COMPUTING, 1989, 11 (01) : 55 - 71
[24] Matrix multiplication performance on commodity shared-memory multiprocessors
Tsilikas, G
Fleury, M
[J]. INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING, 2004, : 13 - 18
[25] Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster
Chakraborty, Abhirup
Singh, Ajit
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
[26] Dynamic Physiological Partitioning on a Shared-Nothing Database Cluster
Schall, Daniel
Harder, Theo
[J]. 2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 1095 - 1106
[27] Shared-Nothing Distributed Enumeration of 2-Plexes
Conte, Alessio
Firmani, Donatella
Patrignani, Maurizio
Torlone, Riccardo
[J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2469 - 2472
[28] Parallel DBMS architecture supporting shared-nothing computers
Jin, Shudong
Feng, Yucai
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 1998, 35 (06): : 520 - 524
[29] FAST DIGIT-REVERSAL ALGORITHMS ON A SHARED-MEMORY MACHINE
SEGUEL, J
BOLLMAN, D
[J]. PARALLEL COMPUTING, 1994, 20 (01) : 93 - 99
[30] Shared-Memory Multi-Processor Scheduling Algorithms for CCSP
Ritson, Carl G.
[J]. WOTUG-30: COMMUNICATING PROCESS ARCHITECTURES 2007, 2007, 65 : 509 - 509

← 1 2 3 4 5 →