Learning a Belief Representation for Delayed Reinforcement Learning

被引:0
|
作者
Liotet, Pierre [1 ]
Venneri, Erick [1 ]
Restelli, Marcello [1 ]
机构
[1] Politecn Milan, Milan, Italy
关键词
reinforcement learning; delays; self-attention network; masked autoregressive flows; belief;
D O I
10.1109/IJCNN52387.2021.9534358
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers sequential decision-making problems where the interactions between an agent and its environment are affected by delays. Delays may be present in the state observation, in the action execution, or in the reward collection. We consider the delayed Markov Decision Process (MDP) framework both in the case of deterministic and stochastic delays. Given the hardness of the delayed MDP problem, we use a heuristic approach to design an algorithm that uses the belief over the current unobserved state to select its action. We design a self-attention prediction module which, given the last observed state and the following sequence of actions, estimates the beliefs over the following states. This algorithm is able to deal with deterministic delays and could potentially be extended to stochastic delays. We empirically evaluate the effectiveness of the proposed approach in both deterministic and stochastic control problems affected by deterministic delays.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Towards a Self-Learning Agent: Using Ranking Functions as a Belief Representation in Reinforcement Learning
    Haeming, Klaus
    Peters, Gabriele
    [J]. NEURAL PROCESSING LETTERS, 2013, 38 (02) : 117 - 129
  • [2] Towards a Self-Learning Agent: Using Ranking Functions as a Belief Representation in Reinforcement Learning
    Klaus Häming
    Gabriele Peters
    [J]. Neural Processing Letters, 2013, 38 : 117 - 129
  • [3] LEARNING NETWORK REPRESENTATION THROUGH REINFORCEMENT LEARNING
    Shen, Siqi
    Fu, Yongquan
    Jia, Adele Lu
    Su, Huayou
    Wang, Qinglin
    Wang, Chengsong
    Dou, Yong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3537 - 3541
  • [4] Decoupling Representation Learning from Reinforcement Learning
    Stooke, Adam
    Lee, Kimin
    Abbeel, Pieter
    Laskin, Michael
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Integrating Reinforcement Learning with Models of Representation Learning
    Jones, Matt
    Canas, Fabian
    [J]. COGNITION IN FLUX, 2010, : 1258 - 1263
  • [6] Masked Contrastive Representation Learning for Reinforcement Learning
    Zhu, Jinhua
    Xia, Yingce
    Wu, Lijun
    Deng, Jiajun
    Zhou, Wengang
    Qin, Tao
    Liu, Tie-Yan
    Li, Houqiang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3421 - 3433
  • [7] Representation Learning on Graphs: A Reinforcement Learning Application
    Madjiheurem, Sephora
    Toni, Laura
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [8] Belief Reward Shaping in Reinforcement Learning
    Marom, Ofir
    Rosman, Benjamin
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3762 - 3769
  • [9] Delayed Reinforcement Learning by Imitation
    Liotet, Pierre
    Maran, Davide
    Bisi, Lorenzo
    Restelli, Marcello
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Provable Benefit of Multitask Representation Learning in Reinforcement Learning
    Cheng, Yuan
    Feng, Songtao
    Yang, Jing
    Zhang, Hong
    Liang, Yingbin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,