Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

被引：0

作者：

Denevi, Giulia ^{[1
,2
]}

Ciliberto, Carlo ^{[3
,4
]}

Grazzi, Riccardo ^{[1
,4
]}

Pontil, Massimiliano ^{[1
,4
]}

机构：

[1] Ist Italiano Tecnol, Genoa, Italy

[2] Univ Genoa, Genoa, Italy

[3] Imperial Coll London, London, England

[4] UCL, London, England

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97 | 2019年 / 97卷

关键词：

ALGORITHM; STABILITY; BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of learning-to-learn: inferring a learning algorithm that works well on a family of tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent (SGD) on the true risk regularized by the square euclidean distance from a bias vector. We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then propose a novel meta-algorithm to estimate the bias term online from a sequence of observed tasks. The small memory footprint and low time complexity of our approach makes it appealing in practice while our theoretical analysis provides guarantees on the generalization properties of the meta-algorithm on new tasks. A key feature of our results is that, when the number of tasks grows and their variance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by standard SGD without a bias term. Numerical experiments demonstrate the effectiveness of our approach in practice.

引用

页数：10

共 50 条

[1] Learning to learn by gradient descent by gradient descent
Andrychowicz, Marcin
Denil, Misha
Colmenarejo, Sergio Gomez
Hoffman, Matthew W.
Pfau, David
Schaul, Tom
Shillingford, Brendan
de Freitas, Nando
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[2] Learning to Learn without Gradient Descent by Gradient Descent
Chen, Yutian
Hoffman, Matthew W.
Colmenarejo, Sergio Gomez
Denil, Misha
Lillicrap, Timothy P.
Botvinick, Matt
de Freitas, Nando
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[3] Learning to Learn Gradient Aggregation by Gradient Descent
Ji, Jinlong
Chen, Xuhui
Wang, Qianlong
Yu, Lixing
Li, Pan
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2614 - 2620
[4] Learning to learn using gradient descent
Hochreiter, S
Younger, AS
Conwell, PR
[J]. ARTIFICIAL NEURAL NETWORKS-ICANN 2001, PROCEEDINGS, 2001, 2130 : 87 - 94
[5] Towards learning-to-learn
Lansdell, Benjamin James
Kording, Konrad Paul
[J]. CURRENT OPINION IN BEHAVIORAL SCIENCES, 2019, 29 : 45 - 50
[6] Incremental Learning-to-Learn with Statistical Guarantees
Denevi, Giulia
Ciliberto, Carlo
Stamos, Dimitris
Pontil, Massimiliano
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 457 - 466
[7] MetaFaaS: Learning-to-learn on Serverless
Pimpalkhute, Varad
Kunde, Shruti
Singhal, Rekha
Palepu, Surya
Chahal, Dheeraj
Pandit, Amey
[J]. PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON BIGIG DATA IN EMERGENT DISTRIBUTED ENVIRONMENTS (BIDEDE 2022), 2022,
[8] Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising
Shi, Hui
Traonmilin, Yann
Aujol, Jean-Francois
[J]. JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2024, 66 (04) : 464 - 477
[9] ON THE REGULARIZATION EFFECT OF STOCHASTIC GRADIENT DESCENT APPLIED TO LEAST-SQUARES
Steinerberger, Stefan
[J]. ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2021, 54 : 610 - 619
[10] Learning-to-learn efficiently with self-learning
Kunde, Shruti
Choudhry, Sharod Roy
Pandit, Amey
Singhal, Rekha
[J]. PROCEEDINGS OF THE 6TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2022, 2022,

← 1 2 3 4 5 →