Regularized Policies are Reward Robust

被引:0
|
作者
Husain, Hisham [1 ]
Ciosek, Kamil [2 ,3 ]
Tomioka, Ryota [3 ]
机构
[1] Australian Natl Univ, CSIRO Data61, Canberra, ACT, Australia
[2] Spotify Res, London, England
[3] Microsoft Res Cambridge, Cambridge, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for using entropy is for exploration and disambiguating optimal policies; however, the theoretical effects are not entirely understood. In this work, we study the more general regularized RL objective and using Fenchel duality; we derive the dual problem which takes the form of an adversarial reward problem. In particular, we find that the optimal policy found by a regularized objective is precisely an optimal policy of a reinforcement learning problem under a worst-case adversarial reward. Our result allows us to reinterpret the popular entropic regularization scheme as a form of robustification. Furthermore, due to the generality of our results, we apply to other existing regularization schemes. Our results thus give insights into the effects of regularization of policies and deepen our understanding of exploration through robust rewards at large.
引用
收藏
页码:64 / 72
页数:9
相关论文
共 50 条
  • [31] Image classification and annotation based on robust regularized coding
    Haixia Zheng
    Horace H. S. Ip
    Signal, Image and Video Processing, 2016, 10 : 55 - 64
  • [32] A Robust Analytical Method for Regularized Long Wave Equations
    Jani, Haresh P.
    Singh, Twinkle R.
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION A-SCIENCE, 2022, 46 (06): : 1667 - 1679
  • [33] Combined Beamformers for Robust Broadband Regularized Superdirective Beamforming
    Berkun, Reuven
    Cohen, Israel
    Benesty, Jacob
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (05) : 877 - 886
  • [34] Robust Graph Regularized Nonnegative Matrix Factorization for Clustering
    Peng, Chong
    Kang, Zhao
    Hu, Yunhong
    Cheng, Jie
    Cheng, Qiang
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2017, 11 (03)
  • [35] Efficient robust doubly adaptive regularized regression with applications
    Karunamuni, Rohana J.
    Kong, Linglong
    Tu, Wei
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (07) : 2210 - 2226
  • [36] Multiview Clustering based on Robust and Regularized Matrix Approximation
    Pu, Jiameng
    Zhang, Qian
    Zhang, Lefei
    Du, Bo
    You, Jane
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 2550 - 2555
  • [37] Robust graph regularized nonnegative matrix factorization for clustering
    Shudong Huang
    Hongjun Wang
    Tao Li
    Tianrui Li
    Zenglin Xu
    Data Mining and Knowledge Discovery, 2018, 32 : 483 - 503
  • [38] Regularized robust optimization: the optimal portfolio execution case
    Moazeni, Somayeh
    Coleman, Thomas F.
    Li, Yuying
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2013, 55 (02) : 341 - 377
  • [39] Learning spatially regularized similarity for robust visual tracking
    Zhou, Xiuzhuang
    Huo, Qirun
    Shang, Yuanyuan
    Xu, Min
    Ding, Hui
    IMAGE AND VISION COMPUTING, 2017, 60 : 134 - 141
  • [40] Robust inner product regularized unsupervised feature selection
    Youcheng Qian
    Xueyan Yin
    Wei Gao
    Multimedia Tools and Applications, 2019, 78 : 33593 - 33615