Exploiting Additive Structure in Factored MDPs for Reinforcement Learning

被引:0
|
作者
Degris, Thomas [1 ]
Sigaud, Olivier [1 ]
Wuillemin, Pierre-Henri [1 ]
机构
[1] Univ Paris 06, F-75005 Paris, France
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SDYNA is a framework able to address large, discrete and stochastic reinforcement learning problems. It incrementally learns a FMDP representing the problem to solve while using FMDP planning techniques to build all efficient policy. SPITI, an instantiation of SDYNA, uses a planning method based oil dynamic programming which cannot exploit the additive structure of a FMDP. In this paper, we present two new instantiations of SDYNA, namely ULP and UNATLP, using a linear programming based planning method that can exploit the additive structure. of a FMDP and address problems out of reach of SPITI.
引用
收藏
页码:15 / 26
页数:12
相关论文
共 50 条
  • [1] Efficient reinforcement learning in factored MDPs
    Kearns, M
    Koller, D
    [J]. IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 740 - 747
  • [2] TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
    Kozlova, Olga
    Sigaud, Olivier
    Meyer, Christophe
    [J]. FROM ANIMALS TO ANIMATS 11, 2010, 6226 : 489 - +
  • [3] Near-optimal Reinforcement Learning in Factored MDPs
    Osband, Ian
    Van Roy, Benjamin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [4] Model-based reinforcement learning in factored-state MDPs
    Strehl, Alexander L.
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 103 - 110
  • [5] Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs
    Kroon, Mark
    Whiteson, Shimon
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 324 - 330
  • [6] Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions
    Deng, Zihao
    Devic, Siddartha
    Juba, Brendan
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [7] Discovering hidden structure in factored MDPs
    Kolobov, Andrey
    Mausam
    Weld, Daniel S.
    [J]. ARTIFICIAL INTELLIGENCE, 2012, 189 : 19 - 47
  • [8] Reinforcement learning for MDPs with constraints
    Geibel, Peter
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 646 - 653
  • [9] An Associative State-Space Metric for Learning in Factored MDPs
    Sequeira, Pedro
    Melo, Francisco S.
    Paiva, Ana
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2013, 2013, 8154 : 163 - 174
  • [10] Multiagent planning with factored MDPs
    Guestrin, C
    Koller, D
    Parr, R
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1523 - 1530