ACCELERATED, PARALLEL, AND PROXIMAL COORDINATE DESCENT

被引:157
|
作者
Fercoq, Olivier [1 ]
Richtarik, Peter [2 ]
机构
[1] Telecom ParisTech, Inst Mines Telecom, LTCI, Paris, France
[2] Univ Edinburgh, Sch Math, Edinburgh, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
randomized coordinate descent; acceleration; parallel methods; proximal methods; complexity; partial separability; convex optimization; big data; ALGORITHM;
D O I
10.1137/130949993
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We propose a new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel, and PROXimal; this is the first time such a method is proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2 (omega) over bar(L) over barR(2)/(k + 1)(2), where k is the iteration counter, (omega) over bar is a data-weighted average degree of separability of the loss function, (L) over bar is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and R is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well serial, parallel, and distributed versions of randomized block coordinate descent. Our bounds match or improve on the best known bounds for these methods.
引用
收藏
页码:1997 / 2023
页数:27
相关论文
共 50 条
  • [1] Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent
    Fercoq, Olivier
    Richtarik, Peter
    SIAM REVIEW, 2016, 58 (04) : 739 - 771
  • [2] Can random proximal coordinate descent be accelerated on nonseparable convex composite minimization problems?
    Chorobura, Flavia
    Glineur, Francois
    Necoara, Ion
    2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [3] Asynchronous delay-aware accelerated proximal coordinate descent for nonconvex nonsmooth problems
    Kazemi, Ehsan
    Wang, Liqiang
    33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 2019, : 1528 - 1535
  • [4] Asynchronous Delay-Aware Accelerated Proximal Coordinate Descent for Nonconvex Nonsmooth Problems
    Kazemi, Ehsan
    Wang, Liqiang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 1528 - 1535
  • [5] On the complexity of parallel coordinate descent
    Tappenden, Rachael
    Takac, Martin
    Richtarik, Peter
    OPTIMIZATION METHODS & SOFTWARE, 2018, 33 (02): : 372 - 395
  • [6] Asynchronous Parallel Greedy Coordinate Descent
    You, Yang
    Lian, XiangRu
    Liu, Ji
    Yu, Hsiang-Fu
    Dhillon, Inderjit S.
    Demmel, James
    Hsieh, Cho-Jui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [7] Parallel coordinate descent for the Adaboost problem
    Fercoq, Olivier
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 354 - 358
  • [8] Accelerated Line Search for Coordinate Descent Optimization
    Yu, Zhou
    Thibault, Jean-Baptiste
    Sauer, Ken
    Bouman, Charles
    Hsieh, Jiang
    2006 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD, VOL 1-6, 2006, : 2841 - 2844
  • [9] An Accelerated Proximal Coordinate Gradient Method
    Lin, Qihang
    Lu, Zhaosong
    Xiao, Lin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [10] An asynchronous parallel stochastic coordinate descent algorithm
    Liu, Ji
    Wright, Stephen J.
    Ré, Christopher
    Bittorf, Victor
    Sridhar, Srikrishna
    Journal of Machine Learning Research, 2015, 16 : 285 - 322