Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

被引：0

作者：

Denevi, Giulia ^{[1
,2
]}

Ciliberto, Carlo ^{[3
,4
]}

Grazzi, Riccardo ^{[1
,4
]}

Pontil, Massimiliano ^{[1
,4
]}

机构：

[1] Ist Italiano Tecnol, Genoa, Italy

[2] Univ Genoa, Genoa, Italy

[3] Imperial Coll London, London, England

[4] UCL, London, England

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97 | 2019年 / 97卷

关键词：

ALGORITHM; STABILITY; BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of learning-to-learn: inferring a learning algorithm that works well on a family of tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent (SGD) on the true risk regularized by the square euclidean distance from a bias vector. We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then propose a novel meta-algorithm to estimate the bias term online from a sequence of observed tasks. The small memory footprint and low time complexity of our approach makes it appealing in practice while our theoretical analysis provides guarantees on the generalization properties of the meta-algorithm on new tasks. A key feature of our results is that, when the number of tasks grows and their variance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by standard SGD without a bias term. Numerical experiments demonstrate the effectiveness of our approach in practice.

引用

页数：10

共 50 条

[41] A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
Le Lan, Charline
Greaves, Joshua
Farebrother, Jesse
Rowland, Mark
Pedregosa, Fabian
Agarwal, Rishabh
Bellemare, Marc
[J]. arXiv, 2022,
[42] Learning-to-learn:: the cognitive skills and the beliefs in the assessment of schooling
Hautamäki, J
[J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 299 - 299
[43] Regularization Effect of Random Node Fault/Noise on Gradient Descent Learning Algorithm
Sum, John
Leung, Chi-Sing
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2619 - 2632
[44] Unforgeability in Stochastic Gradient Descent
Baluta, Teodora
Nikolic, Ivica
Jain, Racchit
Aggarwal, Divesh
Saxena, Prateek
[J]. PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
[45] Preconditioned Stochastic Gradient Descent
Li, Xi-Lin
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466
[46] Learning-to-Learn Agent Adaptation Policy for Abstractive Summarization
Mu, Hongzhang
Liu, Tingwen
Xu, Hongbo
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[47] Stochastic Reweighted Gradient Descent
El Hanchi, Ayoub
Stephens, David A.
Maddison, Chris J.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[48] Stochastic gradient descent tricks
Bottou, Léon
[J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
[49] Byzantine Stochastic Gradient Descent
Alistarh, Dan
Allen-Zhu, Zeyuan
Li, Jerry
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[50] Learning and learning-to-learn by doing: Simulating corporate practice in law school
Okamoto, KS
[J]. JOURNAL OF LEGAL EDUCATION, 1995, 45 (04) : 498 - 512

← 1 2 3 4 5 →