There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

被引:0
|
作者
Grinsztajn, Nathan [1 ]
Ferret, Johan [1 ,2 ]
Pietquin, Olivier [2 ]
Preux, Philippe [1 ]
Geist, Matthieu [2 ]
机构
[1] Univ Lille, CNRS, CRIStAL, Inria,Scool Team, Lille, France
[2] Google Res, Brain Team, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL). From theoretical considerations, we show that approximate reversibility can be learned through a simple surrogate task: ranking randomly sampled trajectory events in chronological order. Intuitively, pairs of events that are always observed in the same order are likely to be separated by an irreversible sequence of actions. Conveniently, learning the temporal order of events can be done in a fully self-supervised way, which we use to estimate the reversibility of actions from experience, without any priors. We propose two different strategies that incorporate reversibility in RL agents, one strategy for exploration (RAE) and one strategy for control (RAC). We demonstrate the potential of reversibility-aware agents in several environments, including the challenging Sokoban game. In synthetic tasks, we show that we can learn control policies that never fail and reduce to zero the side-effects of interactions, even without access to the reward function.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Self-Supervised Attention-Aware Reinforcement Learning
    Wu, Haiping
    Khetarpa, Khimya
    Precup, Doina
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10311 - 10319
  • [2] Reinforcement Learning with Attention that Works: A Self-Supervised Approach
    Manchin, Anthony
    Abbasnejad, Ehsan
    van den Hengel, Anton
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 223 - 230
  • [3] Experimental Assessment of Reversibility-Aware Deep Reinforcement Learning for Optical Data Center Network Reconfiguration
    Sica, Massimiliano
    Singh, Sandeep Kumar
    Proietti, Roberto
    Tornatore, Massimo
    Ben Yoo, S. J.
    [J]. 2023 INTERNATIONAL CONFERENCE ON OPTICAL NETWORK DESIGN AND MODELING, ONDM, 2023,
  • [4] Intrinsically Motivated Self-supervised Learning in Reinforcement Learning
    Zhao, Yue
    Du, Chenzhuang
    Zhao, Hang
    Li, Tiejun
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 3605 - 3615
  • [5] Geography-Aware Self-Supervised Learning
    Ayush, Kumar
    Uzkent, Burak
    Meng, Chenlin
    Tanmay, Kumar
    Burke, Marshall
    Lobell, David
    Ermon, Stefano
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10161 - 10170
  • [6] Self-Supervised Reinforcement Learning for Recommender Systems
    Xin, Xin
    Karatzoglou, Alexandros
    Arapakis, Ioannis
    Jose, Joemon M.
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 931 - 940
  • [7] Knowledge-aware reasoning with self-supervised reinforcement learning for explainable recommendation in MOOCs
    Lin, Yuanguo
    Zhang, Wei
    Lin, Fan
    Zeng, Wenhua
    Zhou, Xiuze
    Wu, Pengcheng
    [J]. NEURAL COMPUTING & APPLICATIONS, 2024, 36 (08): : 4115 - 4132
  • [8] Knowledge-aware reasoning with self-supervised reinforcement learning for explainable recommendation in MOOCs
    Yuanguo Lin
    Wei Zhang
    Fan Lin
    Wenhua Zeng
    Xiuze Zhou
    Pengcheng Wu
    [J]. Neural Computing and Applications, 2024, 36 : 4115 - 4132
  • [9] Self-Supervised Reinforcement Learning with dual-reward for knowledge-aware recommendation
    Zhang, Wei
    Lin, Yuanguo
    Liu, Yong
    You, Huanyu
    Wu, Pengcheng
    Lin, Fan
    Zhou, Xiuze
    [J]. APPLIED SOFT COMPUTING, 2022, 131
  • [10] Self-Supervised Discovering of Interpretable Features for Reinforcement Learning
    Shi, Wenjie
    Huang, Gao
    Song, Shiji
    Wang, Zhuoyuan
    Lin, Tingyu
    Wu, Cheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (05) : 2712 - 2724