Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

被引:0
|
作者
Safaryan, Mher [1 ]
Hanzely, Filip [2 ]
Richtarik, Peter [1 ]
机构
[1] KAUST, Thuwal, Saudi Arabia
[2] TTIC, Chicago, IL USA
关键词
COORDINATE DESCENT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large scale distributed optimization has become the default tool for the training of supervised machine learning models with a large number of parameters and training data. Recent advancements in the field provide several mechanisms for speeding up the training, including compressed communication, variance reduction and acceleration. However, none of these methods is capable of exploiting the inherently rich data-dependent smoothness structure of the local losses beyond standard smoothness constants. In this paper, we argue that when training supervised models, smoothness matrices-information-rich generalizations of the ubiquitous smoothness constants-can and should be exploited for further dramatic gains, both in theory and practice. In order to further alleviate the communication burden inherent in distributed optimization, we propose a novel communication sparsification strategy that can take full advantage of the smoothness matrices associated with local losses. To showcase the power of this tool, we describe how our sparsification technique can be adapted to three distributed optimization algorithms-DCGD [Khirirat et al., 2018], DIANA [Mishchenko et al., 2019] and ADIANA [Li et al., 2020]-yielding significant savings in terms of communication complexity. The new methods always outperform the baselines, often dramatically so.
引用
收藏
页数:15
相关论文
共 10 条
  • [1] Harnessing Smoothness to Accelerate Distributed Optimization
    Qu, Guannan
    Li, Na
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 159 - 166
  • [2] Harnessing Smoothness to Accelerate Distributed Optimization
    Qu, Guannan
    Li, Na
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2018, 5 (03): : 1245 - 1260
  • [3] Communication Compression for Distributed Nonconvex Optimization
    Yi, Xinlei
    Zhang, Shengjun
    Yang, Tao
    Chai, Tianyou
    Johansson, Karl Henrik
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (09) : 5477 - 5492
  • [4] Communication compression techniques in distributed deep learning: A survey
    Wang, Zeqin
    Wen, Ming
    Xu, Yuedong
    Zhou, Yipeng
    Wang, Jessie Hui
    Zhang, Liang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 142
  • [5] CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression
    Li, Zhize
    Richtarik, Peter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?
    He, Yutong
    Huang, Xinmeng
    Yuan, Kun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] Innovation Compression for Communication-Efficient Distributed Optimization With Linear Convergence
    Zhang, Jiaqi
    You, Keyou
    Xie, Lihua
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (11) : 6899 - 6906
  • [8] Communication-Efficient Distributed Minimax Optimization via Markov Compression
    Yang, Linfeng
    Zhang, Zhen
    Che, Keqin
    Yang, Shaofu
    Wang, Suyang
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT I, 2024, 14447 : 540 - 551
  • [9] On the Usage of General-Purpose Compression Techniques for the Optimization of Inter-robot Communication
    Martins, Goncalo S.
    Portugal, David
    Rocha, Rui P.
    INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS (ICINCO 2014), 2016, 370 : 223 - 240
  • [10] Compression and Data Similarity: Combination of Two Techniques for Communication-Efficient Solving of Distributed Variational Inequalities
    Beznosikov, Aleksandr
    Gasnikov, Alexander
    OPTIMIZATION AND APPLICATIONS, OPTIMA 2022, 2022, 13781 : 151 - 162