Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

被引：0

作者：

Safaryan, Mher ^{[1
]}

Hanzely, Filip ^{[2
]}

Richtarik, Peter ^{[1
]}

机构：

[1] KAUST, Thuwal, Saudi Arabia

[2] TTIC, Chicago, IL USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年

关键词：

COORDINATE DESCENT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large scale distributed optimization has become the default tool for the training of supervised machine learning models with a large number of parameters and training data. Recent advancements in the field provide several mechanisms for speeding up the training, including compressed communication, variance reduction and acceleration. However, none of these methods is capable of exploiting the inherently rich data-dependent smoothness structure of the local losses beyond standard smoothness constants. In this paper, we argue that when training supervised models, smoothness matrices-information-rich generalizations of the ubiquitous smoothness constants-can and should be exploited for further dramatic gains, both in theory and practice. In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses. To showcase the power of this tool, we describe how our sparsification technique can be adapted to three distributed optimization algorithms-DCGD [Khirirat et al., 2018], DIANA [Mishchenko et al., 2019] and ADIANA [Li et al., 2020]-yielding significant savings in terms of communication complexity. The new methods always outperform the baselines, often dramatically so.

引用

页数：15

共 10 条

[1] Harnessing Smoothness to Accelerate Distributed Optimization
Qu, Guannan
Li, Na
2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 159 - 166
[2] Harnessing Smoothness to Accelerate Distributed Optimization
Qu, Guannan
Li, Na
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2018, 5 (03): : 1245 - 1260
[3] Communication Compression for Distributed Nonconvex Optimization
Yi, Xinlei
Zhang, Shengjun
Yang, Tao
Chai, Tianyou
Johansson, Karl Henrik
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (09) : 5477 - 5492
[4] Communication compression techniques in distributed deep learning: A survey
Wang, Zeqin
Wen, Ming
Xu, Yuedong
Zhou, Yipeng
Wang, Jessie Hui
Zhang, Liang
JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 142
[5] CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression
Li, Zhize
Richtarik, Peter
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[6] Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?
He, Yutong
Huang, Xinmeng
Yuan, Kun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[7] Innovation Compression for Communication-Efficient Distributed Optimization With Linear Convergence
Zhang, Jiaqi
You, Keyou
Xie, Lihua
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (11) : 6899 - 6906
[8] Communication-Efficient Distributed Minimax Optimization via Markov Compression
Yang, Linfeng
Zhang, Zhen
Che, Keqin
Yang, Shaofu
Wang, Suyang
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT I, 2024, 14447 : 540 - 551
[9] On the Usage of General-Purpose Compression Techniques for the Optimization of Inter-robot Communication
Martins, Goncalo S.
Portugal, David
Rocha, Rui P.
INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS (ICINCO 2014), 2016, 370 : 223 - 240
[10] Compression and Data Similarity: Combination of Two Techniques for Communication-Efficient Solving of Distributed Variational Inequalities
Beznosikov, Aleksandr
Gasnikov, Alexander
OPTIMIZATION AND APPLICATIONS, OPTIMA 2022, 2022, 13781 : 151 - 162

← 1 →