An environment model for nonstationary reinforcement learning

被引:0
|
作者
Choi, SPM [1 ]
Yeung, DY [1 ]
Zhang, NL [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning in nonstationary environments is generally regarded as an important and yet difficult problem. This paper partially addresses the problem by formalizing a subclass of nonstationary environments. The environment model, called hidden-mode Markov decision process (HM-MDP), assumes that environmental changes are always confined to a small number of hidden modes. A mode basically indexes a Markov decision process (MDP) and evolves with time according to a Markov chain. While HM-MDP is a special case of partially observable Markov decision processes (POMDP), modeling an HM-MDP environment via the more general POMDP model unnecessarily increases the problem complexity. A variant of the Baum-Welch algorithm is developed for model learning requiring less data and time.
引用
收藏
页码:987 / 993
页数:7
相关论文
共 50 条
  • [1] Reinforcement learning in nonstationary environment navigation tasks
    Lane, Terran
    Ridens, Martin
    Stevens, Scott
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2007, 4509 : 429 - +
  • [2] Reinforcement learning with nonstationary reward depending on the episode
    Shibuya, Takeshi
    Yasunobu, Seiji
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2145 - 2150
  • [3] Nonstationary Reinforcement Learning: The Blessing of (More) Optimism
    Cheung, Wang Chi
    Simchi-Levi, David
    Zhu, Ruihao
    [J]. MANAGEMENT SCIENCE, 2023, 69 (10) : 5722 - 5739
  • [4] Impedance Control without Environment Model by Reinforcement Learning
    Perrusquia, Adolfo
    Yu, Wen
    Li, Xiaoou
    [J]. 2019 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2019, : 59 - 63
  • [5] Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control
    Mao, Weichao
    Zhang, Kaiqing
    Zhu, Ruihao
    Simchi-Levi, David
    Basar, Tamer
    [J]. MANAGEMENT SCIENCE, 2024,
  • [6] LEARNING AND IMPRINTING IN STATIONARY AND NONSTATIONARY ENVIRONMENT
    PFAFFELHUBER, E
    DAMLE, PS
    [J]. KYBERNETIK, 1973, 13 (04): : 229 - 237
  • [7] Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments
    Bing, Zhenshan
    Knak, Lukas
    Cheng, Long
    Morin, Fabrice O.
    Huang, Kai
    Knoll, Alois
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 15
  • [8] Bootstrap Estimated Uncertainty of the Environment Model for Model-Based Reinforcement Learning
    Huang, Wenzhen
    Zhang, Junge
    Huang, Kaiqi
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3870 - 3877
  • [9] Learning to behave by environment reinforcement
    Scardua, LA
    Costa, AHR
    da Cruz, JJ
    [J]. ROBOCUP-99: ROBOT SOCCER WORLD CUP III, 2000, 1856 : 439 - 449
  • [10] Reinforcement learning: the effect of environment
    Xu, He
    Herzog, Michael
    [J]. PERCEPTION, 2016, 45 : 273 - 274