Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

被引：0

作者：

Di, Qiwei ^{[1
]}

He, Jiafan ^{[1
]}

Zhou, Dongruo ^{[1
]}

Gu, Quanquan ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202 | 2023年 / 202卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extended value iteration with a fine-grained variance-aware confidence set, where the variance is estimated recursively from high-order moments. Our algorithm achieves an (O) over tilde (dB*root K) regret bound, where d is the dimension of the feature mapping in the linear transition kernel, B* is the upper bound of the total cumulative cost for the optimal policy, and K is the number of episodes. Our regret upper bound matches the Omega(dB*root K) lower bound of linear mixture SSPs in Min et al. (2022), which suggests that our algorithm is nearly minimax optimal.

引用

页数：28

共 50 条

[1] Minimax Regret for Stochastic Shortest Path
Cohen, Alon
Efroni, Yonathan
Mansour, Yishay
Rosenberg, Aviv
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[2] Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation
Vial, Daniel
Parulekar, Advait
Shakkottai, Sanjay
Srikant, R.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[3] Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP
Chen, Liyu
Jain, Rahul
Luo, Haipeng
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[4] Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Tarbouriech, Jean
Zhou, Runlong
Du, Simon S.
Pirotta, Matteo
Valko, Michal
Lazaric, Alessandro
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[5] Learning Stochastic Shortest Path with Linear Function Approximation
Min, Yifei
He, Jiafan
Wang, Tianhao
Gu, Quanquan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[6] Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation
Hu, Pihe
Chen, Yu
Huang, Longbo
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[7] Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Wu, Yue
Zhou, Dongruo
Gu, Quanquan
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[8] Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits
Li, Yingkai
Wang, Yining
Zhou, Yuan
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (01) : 372 - 388
[9] Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits
Li, Yingkai
Wang, Yining
Zhou, Yuan
[J]. CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[10] Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs
Xue, Bo
Wang, Guanghui
Wang, Yimu
Zhang, Lijun
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2936 - 2942

← 1 2 3 4 5 →