Regularized Policies are Reward Robust

被引:0
|
作者
Husain, Hisham [1 ]
Ciosek, Kamil [2 ,3 ]
Tomioka, Ryota [3 ]
机构
[1] Australian Natl Univ, CSIRO Data61, Canberra, ACT, Australia
[2] Spotify Res, London, England
[3] Microsoft Res Cambridge, Cambridge, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which takes the form of an adversarial reward problem. In particular, we find that the optimal policy found by a regularized objective is precisely an optimal policy of a reinforcement learning problem under a worst-case adversarial reward. Our result allows us to reinterpret the popular entropic regularization scheme as a form of robustification. Furthermore, due to the generality of our results, we apply to other existing regularization schemes. Our results thus give insights into the effects of regularization of policies and deepen our understanding of exploration through robust rewards at large.
引用
收藏
页码:64 / 72
页数:9
相关论文
共 50 条
  • [1] Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies
    Regan, Kevin
    Boutilier, Craig
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1127 - 1133
  • [2] Robust Regularized Kernel Regression
    Zhu, Jianke
    Hoi, Steven C. H.
    Lyu, Michael Rung-Tsong
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (06): : 1639 - 1644
  • [3] Robust priors for regularized regression
    Bobadilla-Suarez, Sebastian
    Jones, Matt
    Love, Bradley C.
    COGNITIVE PSYCHOLOGY, 2022, 132
  • [4] Boosted and Reward-regularized Classification for Apprenticeship Learning
    Piot, Bilal
    Geist, Matthieu
    Pietquin, Olivier
    AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1249 - 1256
  • [5] On preferences and reward policies over rankings
    Faella, Marco
    Sauro, Luigi
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2024, 38 (02)
  • [6] On policies to reward the value added by educators
    Cawley, J
    Heckman, J
    Vytlacil, E
    REVIEW OF ECONOMICS AND STATISTICS, 1999, 81 (04) : 720 - 727
  • [7] Self Scaled Regularized Robust Regression
    Wang, Yin
    Dicle, Caglayan
    Sznaier, Mario
    Camps, Octavia
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3261 - 3269
  • [8] Regularized Robust Coding for Face Recognition
    Yang, Meng
    Zhang, Lei
    Yang, Jian
    Zhang, David
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (05) : 1753 - 1766
  • [9] Cellwise robust regularized discriminant analysis
    Aerts, Stephanie
    Wilms, Ines
    STATISTICAL ANALYSIS AND DATA MINING, 2017, 10 (06) : 436 - 447