Multi-Agent Reinforcement Learning Algorithm with Variable Optimistic-Pessimistic Criterion

被引:1
|
作者
Akchurina, Natalia [1 ]
机构
[1] Univ Gesamthsch Paderborn, Int Grad Sch Dynam Intelligent Syst, D-4790 Paderborn, Germany
来源
ECAI 2008, PROCEEDINGS | 2008年 / 178卷
关键词
D O I
10.3233/978-1-58603-891-5-433
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A reinforcement learning algorithm for multi-agent systems based on variable Hurwicz's optimistic-pessimistic criterion is proposed. The formal proof of its convergence is given. Hurwicz's criterion allows to embed initial knowledge of how friendly the environment in which the agent is supposed to function will be. Thorough testing of the developed algorithm against well-known reinforcement learning algorithms has shown that in many cases its successful performance can be explained by its tendency to force the other agents to follow the policy which is more profitable for it. In addition the variability of Hurwicz's criterion allowed it to converge to best-response against opponents with stationary policies.
引用
收藏
页码:433 / +
页数:2
相关论文
共 50 条
  • [1] Optimistic-Pessimistic Q-Learning Algorithm for Multi-Agent Systems
    Akchurina, Natalia
    MULTIAGENT SYSTEM TECHNOLOGIES, PROCEEDINGS, 2008, 5244 : 13 - 24
  • [2] Optimistic sequential multi-agent reinforcement learning with motivational communication
    Huang, Anqi
    Wang, Yongli
    Zhou, Xiaoliang
    Zou, Haochen
    Dong, Xu
    Che, Xun
    NEURAL NETWORKS, 2024, 179
  • [3] Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning
    Li, Chao
    Zhang, Yupeng
    Wang, Jianqi
    Hu, Yujing
    Dong, Shaokang
    Li, Wenbin
    Lv, Tangjie
    Fan, Changjie
    Gao, Yang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17453 - 17460
  • [4] Cross-Observability Optimistic-Pessimistic Safe Reinforcement Learning for Interactive Motion Planning With Visual Occlusion
    Hou, Xiaohui
    Gan, Minggang
    Wu, Wei
    Ji, Yuan
    Zhao, Shiyue
    Chen, Jie
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 17602 - 17613
  • [5] Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning
    Zhao, Xutong
    Pan, Yangchen
    Xiao, Chenjun
    Chandar, Sarath
    Rajendran, Janarthanan
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 2529 - 2540
  • [6] Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning
    Ba, Yanwen
    Liu, Xuan
    Chen, Xinning
    Wang, Hao
    Xu, Yang
    Li, Kenli
    Zhang, Shigeng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17299 - 17307
  • [7] LMRL: A multi-agent reinforcement learning model and algorithm
    Wang, BN
    Gao, Y
    Chen, ZQ
    Xie, JY
    Chen, SF
    Third International Conference on Information Technology and Applications, Vol 1, Proceedings, 2005, : 303 - 307
  • [8] Sequence to Sequence Multi-agent Reinforcement Learning Algorithm
    Shi T.
    Wang L.
    Huang Z.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2021, 34 (03): : 206 - 213
  • [9] A new accelerating algorithm for multi-agent reinforcement learning
    张汝波
    仲宇
    顾国昌
    Journal of Harbin Institute of Technology, 2005, (01) : 48 - 51
  • [10] Multi-Agent Reinforcement Learning
    Stankovic, Milos
    2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43