USING TAYLOR-APPROXIMATED GRADIENTS TO IMPROVE THE FRANK-WOLFE METHOD FOR EMPIRICAL RISK MINIMIZATION

被引：0

作者：

Xiong, Zikai ^{[1
]}

Freund, Robert M. ^{[2
]}

机构：

[1] MIT, Operat Res Ctr, Cambridge, MA 02139 USA

[2] MIT, Sloan Sch Management, Cambridge, MA 02139 USA

来源：

SIAM JOURNAL ON OPTIMIZATION | 2024年 / 34卷 / 03期

关键词：

Frank-Wolfe; linear minimization oracle; empirical risk minimization; linear prediction; computational complexity; convex optimization; CONVEX;

D O I：

10.1137/22M1519286

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

The Frank-Wolfe method has become increasingly useful in statistical and machine learning applications due to the structure-inducing properties of the iterates and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of empirical risk minimization-one of the fundamental optimization problems in statistical and machine learning-the computational effectiveness of Frank-Wolfe methods typically grows linearly in the number of data observations n . This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on n , we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example), and we propose amending the Frank-Wolfe method with Taylor series-approximated gradients, including variants for both deterministic and stochastic settings. Compared with current state-of-the-art methods in the regime where the optimality tolerance epsilon is sufficiently small, our methods are able to simultaneously reduce the dependence on large n while obtaining optimal convergence rates of Frank-Wolfe methods in both convex and nonconvex settings. We also propose a novel adaptive step-size approach for which we have computational guarantees. Finally, we present computational experiments which show that our methods exhibit very significant speedups over existing methods on real-world datasets for both convex and nonconvex binary classification problems.

引用

页码：2503 / 2534

页数：32

共 3 条

[1] A Newton Frank-Wolfe method for constrained self-concordant minimization
Liu, Deyi
Cevher, Volkan
Tran-Dinh, Quoc
JOURNAL OF GLOBAL OPTIMIZATION, 2022, 83 (02) : 273 - 299
[2] Speeding up the Frank-Wolfe method using the Orthogonal Jacobi polynomials
Francis, Robin
Chepuri, Sundeep Prabhakar
2022 56TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2022, : 1081 - 1085
[3] Semi-supervised empirical risk minimization: Using unlabeled data to improve prediction
Yuval, Oren
Rosset, Saharon
ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (01): : 1434 - 1460

← 1 →