Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

被引:0
|
作者
Denevi, Giulia [1 ,2 ]
Ciliberto, Carlo [3 ,4 ]
Grazzi, Riccardo [1 ,4 ]
Pontil, Massimiliano [1 ,4 ]
机构
[1] Ist Italiano Tecnol, Genoa, Italy
[2] Univ Genoa, Genoa, Italy
[3] Imperial Coll London, London, England
[4] UCL, London, England
关键词
ALGORITHM; STABILITY; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of learning-to-learn: inferring a learning algorithm that works well on a family of tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent (SGD) on the true risk regularized by the square euclidean distance from a bias vector. We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then propose a novel meta-algorithm to estimate the bias term online from a sequence of observed tasks. The small memory footprint and low time complexity of our approach makes it appealing in practice while our theoretical analysis provides guarantees on the generalization properties of the meta-algorithm on new tasks. A key feature of our results is that, when the number of tasks grows and their variance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by standard SGD without a bias term. Numerical experiments demonstrate the effectiveness of our approach in practice.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
    Le Lan, Charline
    Greaves, Joshua
    Farebrother, Jesse
    Rowland, Mark
    Pedregosa, Fabian
    Agarwal, Rishabh
    Bellemare, Marc
    [J]. arXiv, 2022,
  • [42] Learning-to-learn:: the cognitive skills and the beliefs in the assessment of schooling
    Hautamäki, J
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 299 - 299
  • [43] Regularization Effect of Random Node Fault/Noise on Gradient Descent Learning Algorithm
    Sum, John
    Leung, Chi-Sing
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2619 - 2632
  • [44] Unforgeability in Stochastic Gradient Descent
    Baluta, Teodora
    Nikolic, Ivica
    Jain, Racchit
    Aggarwal, Divesh
    Saxena, Prateek
    [J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
  • [45] Preconditioned Stochastic Gradient Descent
    Li, Xi-Lin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466
  • [46] Learning-to-Learn Agent Adaptation Policy for Abstractive Summarization
    Mu, Hongzhang
    Liu, Tingwen
    Xu, Hongbo
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [47] Stochastic Reweighted Gradient Descent
    El Hanchi, Ayoub
    Stephens, David A.
    Maddison, Chris J.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [48] Stochastic gradient descent tricks
    Bottou, Léon
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
  • [49] Byzantine Stochastic Gradient Descent
    Alistarh, Dan
    Allen-Zhu, Zeyuan
    Li, Jerry
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31