Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

被引:0
|
作者
Di, Qiwei [1 ]
He, Jiafan [1 ]
Zhou, Dongruo [1 ]
Gu, Quanquan [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extended value iteration with a fine-grained variance-aware confidence set, where the variance is estimated recursively from high-order moments. Our algorithm achieves an (O) over tilde (dB*root K) regret bound, where d is the dimension of the feature mapping in the linear transition kernel, B* is the upper bound of the total cumulative cost for the optimal policy, and K is the number of episodes. Our regret upper bound matches the Omega(dB*root K) lower bound of linear mixture SSPs in Min et al. (2022), which suggests that our algorithm is nearly minimax optimal.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] Minimax Regret for Stochastic Shortest Path
    Cohen, Alon
    Efroni, Yonathan
    Mansour, Yishay
    Rosenberg, Aviv
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation
    Vial, Daniel
    Parulekar, Advait
    Shakkottai, Sanjay
    Srikant, R.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP
    Chen, Liyu
    Jain, Rahul
    Luo, Haipeng
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
    Tarbouriech, Jean
    Zhou, Runlong
    Du, Simon S.
    Pirotta, Matteo
    Valko, Michal
    Lazaric, Alessandro
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [5] Learning Stochastic Shortest Path with Linear Function Approximation
    Min, Yifei
    He, Jiafan
    Wang, Tianhao
    Gu, Quanquan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [6] Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation
    Hu, Pihe
    Chen, Yu
    Huang, Longbo
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
    Wu, Yue
    Zhou, Dongruo
    Gu, Quanquan
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [8] Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits
    Li, Yingkai
    Wang, Yining
    Zhou, Yuan
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (01) : 372 - 388
  • [9] Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits
    Li, Yingkai
    Wang, Yining
    Zhou, Yuan
    [J]. CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [10] Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
    Xue, Bo
    Wang, Guanghui
    Wang, Yimu
    Zhang, Lijun
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2936 - 2942