Hybrid Reward Architecture for Reinforcement Learning

被引：0

作者：

van Seijen, Harm ^{[1
]}

Fatemi, Mehdi ^{[1
]}

Romoff, Joshua ^{[1
,2
]}

Laroche, Romain ^{[1
]}

Barnes, Tavian ^{[1
]}

Tsang, Jeffrey ^{[1
]}

机构：

[1] Microsoft Maluuba, Montreal, PQ, Canada

[2] McGill Univ, Montreal, PQ, Canada

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017) | 2017年 / 30卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the corresponding value function can be approximated more easily by a low-dimensional representation, enabling more effective learning. We demonstrate HRA on a toy-problem and the Atari game Ms. Pac-Man, where HRA achieves above-human performance.

引用

页数：11

共 50 条

[1] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
Yang, Yulong
Cao, Weihua
Guo, Linwei
Gan, Chao
Wu, Min
[J]. 2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
[2] Reward Shaping from Hybrid Systems Models in Reinforcement Learning
Qian, Marian
Mitsch, Stefan
[J]. NASA FORMAL METHODS, NFM 2023, 2023, 13903 : 122 - 139
[3] Deep Reinforcement Learning by Parallelizing Reward and Punishment using the MaxPain Architecture
Wang, Jiexin
Elfwing, Stefan
Uchibe, Eiji
[J]. 2018 JOINT IEEE 8TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2018, : 175 - 180
[4] Multi-Reward Architecture based Reinforcement Learning for Highway Driving Policies
Yuan, Wei
Yang, Ming
He, Yuesheng
Wang, Chunxiang
Wang, Bing
[J]. 2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 3810 - 3815
[5] Reward Reports for Reinforcement Learning
Gilbert, Thomas Krendl
Lambert, Nathan
Dean, Sarah
Zick, Tom
Snoswell, Aaron
Mehta, Soham
[J]. PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 84 - 130
[6] Reward, motivation, and reinforcement learning
Dayan, P
Balleine, BW
[J]. NEURON, 2002, 36 (02) : 285 - 298
[7] A hybrid agent architecture integrating desire, intention and reinforcement learning
Tan, Ah-Hwee
Ong, Yew-Soon
Tapanuj, Akejariyawong
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) : 8477 - 8487
[8] Time-Varying Weights in Multi-Reward Architecture for Deep Reinforcement Learning
Xu, Meng
Chen, Xinhong
She, Yechao
Jin, Yang
Wang, Jianping
[J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1865 - 1881
[9] Information Directed Reward Learning for Reinforcement Learning
Lindner, David
Turchetta, Matteo
Tschiatschek, Sebastian
Ciosek, Kamil
Krause, Andreas
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] Reinforcement learning reward functions for unsupervised learning
Fyfe, Colin
Lai, Pei Ling
[J]. ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +

← 1 2 3 4 5 →