Distributed-Memory Parallel JointNMF

被引：1

作者：

Eswar, Srinivas ^{[1
]}

Cobb, Benjamin ^{[2
]}

Hayashi, Koby ^{[2
]}

Kannan, Ramakrishnan ^{[3
]}

Ballard, Grey ^{[4
]}

Vuduc, Richard ^{[2
]}

Park, Haesun ^{[2
]}

机构：

[1] Argonne Natl Lab, Lemont, IL 60439 USA

[2] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA

[3] Oak Ridge Natl Lab, Oak Ridge, TN USA

[4] Wake Forest Univ, Dept Comp Sci, Winston Salem, NC 27101 USA

来源：

PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023 | 2023年

基金：

美国国家科学基金会; 美国能源部;

关键词：

High Performance Computing; Multimodal Inputs; Nonnegative Matrix Factorization; NONNEGATIVE MATRIX; COMMUNICATION; MPI;

D O I：

10.1145/3577193.3593733

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.

引用

页码：301 / 312

页数：12

共 50 条

[1] Parallel ILP for distributed-memory architectures
Nuno A. Fonseca
Ashwin Srinivasan
Fernando Silva
Rui Camacho
Machine Learning, 2009, 74 : 257 - 279
[2] PARALLEL ANNEALING ON DISTRIBUTED-MEMORY SYSTEMS
LEE, FH
STILES, GS
SWAMINATHAN, V
PROGRAMMING AND COMPUTER SOFTWARE, 1995, 21 (01) : 1 - 8
[3] Parallel ILP for distributed-memory architectures
Fonseca, Nuno A.
Srinivasan, Ashwin
Silva, Fernando
Camacho, Rui
MACHINE LEARNING, 2009, 74 (03) : 257 - 279
[4] A PROCESS AND MEMORY MODEL FOR A PARALLEL DISTRIBUTED-MEMORY MACHINE
ISTAVRINOS, P
BORRMANN, L
LECTURE NOTES IN COMPUTER SCIENCE, 1990, 457 : 479 - 488
[5] A PARALLEL TRIANGULAR SOLVER FOR A DISTRIBUTED-MEMORY MULTIPROCESSOR
LI, GG
COLEMAN, TF
SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, 1988, 9 (03): : 485 - 502
[6] SYNTHETIC MODELS OF DISTRIBUTED-MEMORY PARALLEL PROGRAMS
POPLAWSKI, DA
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1991, 12 (04) : 423 - 426
[7] PARALLEL TALBOT ALGORITHM FOR DISTRIBUTED-MEMORY MACHINES
DEROSA, MA
GIUNTA, G
RIZZARDI, M
PARALLEL COMPUTING, 1995, 21 (05) : 783 - 801
[8] Parallel feature selection for distributed-memory clusters
Gonzalez-Dominguez, Jorge
Bolon-Canedo, Veronica
Freire, Borja
Tourino, Juan
INFORMATION SCIENCES, 2019, 496 : 399 - 409
[9] Numerical integration on distributed-memory parallel systems
Ciegis, R
Sablinskas, R
Wasniewski, J
RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, 1997, 1332 : 329 - 336
[10] Portable, parallel transformation: Distributed-memory approach
Covick, LA
Sando, KM
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1996, 17 (08) : 992 - 1001

← 1 2 3 4 5 →