Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

被引:0
|
作者
Denevi, Giulia [1 ,2 ]
Ciliberto, Carlo [3 ,4 ]
Grazzi, Riccardo [1 ,4 ]
Pontil, Massimiliano [1 ,4 ]
机构
[1] Ist Italiano Tecnol, Genoa, Italy
[2] Univ Genoa, Genoa, Italy
[3] Imperial Coll London, London, England
[4] UCL, London, England
关键词
ALGORITHM; STABILITY; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of learning-to-learn: inferring a learning algorithm that works well on a family of tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent (SGD) on the true risk regularized by the square euclidean distance from a bias vector. We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then propose a novel meta-algorithm to estimate the bias term online from a sequence of observed tasks. The small memory footprint and low time complexity of our approach makes it appealing in practice while our theoretical analysis provides guarantees on the generalization properties of the meta-algorithm on new tasks. A key feature of our results is that, when the number of tasks grows and their variance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by standard SGD without a bias term. Numerical experiments demonstrate the effectiveness of our approach in practice.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Learning to learn by gradient descent by gradient descent
    Andrychowicz, Marcin
    Denil, Misha
    Colmenarejo, Sergio Gomez
    Hoffman, Matthew W.
    Pfau, David
    Schaul, Tom
    Shillingford, Brendan
    de Freitas, Nando
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [2] Learning to Learn without Gradient Descent by Gradient Descent
    Chen, Yutian
    Hoffman, Matthew W.
    Colmenarejo, Sergio Gomez
    Denil, Misha
    Lillicrap, Timothy P.
    Botvinick, Matt
    de Freitas, Nando
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [3] Learning to Learn Gradient Aggregation by Gradient Descent
    Ji, Jinlong
    Chen, Xuhui
    Wang, Qianlong
    Yu, Lixing
    Li, Pan
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2614 - 2620
  • [4] Learning to learn using gradient descent
    Hochreiter, S
    Younger, AS
    Conwell, PR
    [J]. ARTIFICIAL NEURAL NETWORKS-ICANN 2001, PROCEEDINGS, 2001, 2130 : 87 - 94
  • [5] Towards learning-to-learn
    Lansdell, Benjamin James
    Kording, Konrad Paul
    [J]. CURRENT OPINION IN BEHAVIORAL SCIENCES, 2019, 29 : 45 - 50
  • [6] Incremental Learning-to-Learn with Statistical Guarantees
    Denevi, Giulia
    Ciliberto, Carlo
    Stamos, Dimitris
    Pontil, Massimiliano
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 457 - 466
  • [7] MetaFaaS: Learning-to-learn on Serverless
    Pimpalkhute, Varad
    Kunde, Shruti
    Singhal, Rekha
    Palepu, Surya
    Chahal, Dheeraj
    Pandit, Amey
    [J]. PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON BIGIG DATA IN EMERGENT DISTRIBUTED ENVIRONMENTS (BIDEDE 2022), 2022,
  • [8] Batch-Less Stochastic Gradient Descent for Compressive Learning of Deep Regularization for Image Denoising
    Shi, Hui
    Traonmilin, Yann
    Aujol, Jean-Francois
    [J]. JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2024, 66 (04) : 464 - 477
  • [9] ON THE REGULARIZATION EFFECT OF STOCHASTIC GRADIENT DESCENT APPLIED TO LEAST-SQUARES
    Steinerberger, Stefan
    [J]. ELECTRONIC TRANSACTIONS ON NUMERICAL ANALYSIS, 2021, 54 : 610 - 619
  • [10] Learning-to-learn efficiently with self-learning
    Kunde, Shruti
    Choudhry, Sharod Roy
    Pandit, Amey
    Singhal, Rekha
    [J]. PROCEEDINGS OF THE 6TH WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2022, 2022,