Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

被引:0
|
作者
Haghifam, Mahdi [1 ,2 ]
Rodriguez-Galvez, Borja [3 ]
Thobaben, Ragnar [3 ]
Skoglund, Mikael [3 ]
Roy, Daniel M. [1 ,2 ]
Dziugaite, Gintare Karolina [4 ,5 ,6 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] KTH Royal Inst Technol, Stockholm, Sweden
[4] Google Res, Toronto, ON, Canada
[5] Mila, Montreal, PQ, Canada
[6] McGill, Montreal, PQ, Canada
基金
瑞典研究理事会; 加拿大自然科学与工程研究理事会;
关键词
STABILITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
引用
收藏
页码:663 / 706
页数:44
相关论文
共 50 条
  • [31] Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
    Srinivas, Niranjan
    Krause, Andreas
    Kakade, Sham M.
    Seeger, Matthias W.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (05) : 3250 - 3265
  • [32] Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence
    Jose, Sharu Theresa
    Simeone, Osvaldo
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1461 - 1465
  • [33] On the Generalization for Transfer Learning: An Information-Theoretic Analysis
    Wu, Xuetong
    Manton, Jonathan H.
    Aickelin, Uwe
    Zhu, Jingge
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (10) : 7089 - 7124
  • [34] Towards a Unified Information-Theoretic Framework for Generalization
    Haghifam, Mahdi
    Dziugaite, Gintare Karolina
    Moran, Shay
    Roy, Daniel M.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [35] An information-theoretic route from generalization in expectation to generalization in probability
    Alabdulmohsin, Ibrahim
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 92 - 100
  • [36] An Information-Theoretic View of Stochastic Localization
    El Alaoui, Ahmed
    Montanari, Andrea
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (11) : 7423 - 7426
  • [37] Models and information-theoretic bounds for nanopore sequencing
    Mao, Wei
    Diggavi, Suhas
    Kannan, Sreeram
    2017 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2017, : 2458 - 2462
  • [38] Generalization performance of multi-pass stochastic gradient descent with convex loss functions
    Lei, Yunwen
    Hu, Ting
    Tang, Ke
    Journal of Machine Learning Research, 2021, 22
  • [39] Information-theoretic bounds for steganography in visual multimedia
    El-Arsh, Hassan Y.
    Abdelaziz, Amr
    Elliethy, Ahmed
    Aly, H. A.
    Gulliver, T. Aaron
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2025, 89
  • [40] Information-Theoretic Bounds for Adaptive Sparse Recovery
    Aksoylar, Cem
    Saligrama, Venkatesh
    2014 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2014, : 1311 - 1315