Regularized Policies are Reward Robust

被引:0
|
作者
Husain, Hisham [1 ]
Ciosek, Kamil [2 ,3 ]
Tomioka, Ryota [3 ]
机构
[1] Australian Natl Univ, CSIRO Data61, Canberra, ACT, Australia
[2] Spotify Res, London, England
[3] Microsoft Res Cambridge, Cambridge, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which takes the form of an adversarial reward problem. In particular, we find that the optimal policy found by a regularized objective is precisely an optimal policy of a reinforcement learning problem under a worst-case adversarial reward. Our result allows us to reinterpret the popular entropic regularization scheme as a form of robustification. Furthermore, due to the generality of our results, we apply to other existing regularization schemes. Our results thus give insights into the effects of regularization of policies and deepen our understanding of exploration through robust rewards at large.
引用
收藏
页码:64 / 72
页数:9
相关论文
共 50 条
  • [21] Structured regularized robust coding for face recognition
    Wang, Xing
    Yang, Meng
    Shen, Linlin
    NEUROCOMPUTING, 2016, 216 : 18 - 27
  • [22] On the effectiveness of reward-based policies: Are we using the proper concept of tax reward?
    Lisi, Gaetano
    ECONOMICS AND BUSINESS LETTERS, 2022, 11 (01): : 41 - 45
  • [23] Are speed limit policies robust?
    Yetman, James
    JOURNAL OF MACROECONOMICS, 2006, 28 (04) : 665 - 679
  • [24] Pushed or pulled? Transfer of reward management policies in MNCs
    Sayim, Kadire Zeynep
    INTERNATIONAL JOURNAL OF HUMAN RESOURCE MANAGEMENT, 2010, 21 (14): : 2631 - 2658
  • [25] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
  • [26] Image classification and annotation based on robust regularized coding
    Zheng, Haixia
    Ip, Horace H. S.
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (01) : 55 - 64
  • [27] Regularized robust optimization: the optimal portfolio execution case
    Somayeh Moazeni
    Thomas F. Coleman
    Yuying Li
    Computational Optimization and Applications, 2013, 55 : 341 - 377
  • [28] Regularized optimization with spatial coupling for robust decision making
    Zheng, Yuchen
    Lee, Ilbin
    Serban, Nicoleta
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2018, 270 (03) : 898 - 906
  • [29] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803
  • [30] Robust inner product regularized unsupervised feature selection
    Qian, Youcheng
    Yin, Xueyan
    Gao, Wei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (23) : 33593 - 33615