Minimax weight learning for absorbing MDPs

被引：0

作者：

Li, Fengying ^{[1
]}

Li, Yuqiang ^{[1
]}

Wu, Xianyi ^{[1
]}

机构：

[1] East China Normal Univ, Sch Stat, KLATASDS MOE, Shanghai 200062, Peoples R China

来源：

STATISTICAL PAPERS | 2024年 / 65卷 / 06期

基金：

国家重点研发计划;

关键词：

Absorbing MDP; Off-policy; Minimax weight learning; Policy evaluation; Occupancy measure; MODELS;

D O I：

10.1007/s00362-023-01491-4

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Reinforcement learning policy evaluation problems are often modeled as finite or discounted/averaged infinite-horizon Markov Decision Processes (MDPs). In this paper, we study undiscounted off-policy evaluation for absorbing MDPs. Given the dataset consisting of i.i.d episodes under a given truncation level, we propose an algorithm (referred to as MWLA in the text) to directly estimate the expected return via the importance ratio of the state-action occupancy measure. The Mean Square Error (MSE) bound of the MWLA method is provided and the dependence of statistical errors on the data size and the truncation level are analyzed. The performance of the algorithm is illustrated by means of computational experiments under an episodic taxi environment

引用

页码：3545 / 3582

页数：38

共 50 条

[21] Cooperative Online Learning in Stochastic and Adversarial MDPs
Lancewicki, Tal
Rosenberg, Aviv
Mansour, Yishay
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[22] Reduction Techniques for Model Checking and Learning in MDPs
Bharadwaj, Suda
Le Roux, Stephane
Perez, Guillermo A.
Topcu, Ufuk
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4273 - 4279
[23] Learning option MDPs from small data
Zehfroosh, Ashkan
Tanner, Herbert G.
Heinz, Jeffrey
2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 252 - 257
[24] Minimax Model Learning
Voloshin, Cameron
Jiang, Nan
Yue, Yisong
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[25] Reinforcement Learning in Finite MDPs: PAC Analysis
Strehl, Alexander L.
Li, Lihong
Littman, Michael L.
JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 2413 - 2444
[26] Reinforcement Learning in Reward-Mixing MDPs
Kwon, Jeongyeol
Efroni, Yonathan
Caramanis, Constantine
Mannor, Shie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[27] TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
Kozlova, Olga
Sigaud, Olivier
Meyer, Christophe
FROM ANIMALS TO ANIMATS 11, 2010, 6226 : 489 - +
[28] Safety-Constrained Reinforcement Learning for MDPs
Junges, Sebastian
Jansen, Nils
Dehnert, Christian
Topcu, Ufuk
Katoen, Joost-Pieter
TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS (TACAS 2016), 2016, 9636 : 130 - 146
[29] Learning to Act in Decentralized Partially Observable MDPs
Dibangoye, Jilles S.
Buffet, Olivier
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[30] Belief Propagation for MiniMax Weight Matching
Yuan, Mindi
Li, Shen
Shen, Wei
Pavlidis, Yannis
MODELLING, COMPUTATION AND OPTIMIZATION IN INFORMATION SYSTEMS AND MANAGEMENT SCIENCES - MCO 2015, PT 1, 2015, 359 : 37 - 45

← 1 2 3 4 5 →