Learning option MDPs from small data

被引:0
|
作者
Zehfroosh, Ashkan [1 ]
Tanner, Herbert G. [1 ]
Heinz, Jeffrey [2 ]
机构
[1] Univ Delaware, Dept Mech Engn, Newark, DE 19716 USA
[2] SUNY Stony Brook, Dept Linguist & Insti Tute Adv Computat Sci, Stony Brook, NY 11794 USA
来源
2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC) | 2018年
关键词
ACQUISITION; INFANTS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from small data is a challenge that presents itself in applications of human-robot interaction (HRI) in the context of pediatric rehabilitation. Discrete models of computation such as an Markov decision process (MDP) can be used to capture the dynamics of HRI, but the parameters of those models are usually unknown and (human) subject dependent. This paper combines an abstraction method for MDPs, with a parameter estimation method originally developed for natural language processing, designed specifically to operate on small data. The combination expedites learning from small data and offers more accurate models that lend themselves to more effective decision-making. Numerical evidence in support of the approach is offered in a comparative study on a small grid-world example.
引用
收藏
页码:252 / 257
页数:6
相关论文
共 50 条
  • [1] Learning Large Graph-Based MDPs With Historical Data
    Haksar, Ravi N.
    Schwager, Mac
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2022, 9 (03): : 1447 - 1458
  • [2] Learning sequential option hedging models from market data
    Nian, Ke
    Coleman, Thomas F.
    Li, Yuying
    JOURNAL OF BANKING & FINANCE, 2021, 133
  • [3] ε-MDPs:: Learning in varying environments
    Szita, I
    Takács, B
    Lorincz, A
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (01) : 145 - 173
  • [4] Learning to Branch with Tree MDPs
    Scavuzzo, Lara
    Chen, Feng Yang
    Chetelat, Didier
    Gasse, Maxime
    Lodi, Andrea
    Yorke-Smith, Neil
    Aardal, Karen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Reinforcement learning for MDPs with constraints
    Geibel, Peter
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 646 - 653
  • [6] Option Pricing Using Machine Learning with Intraday Data of TAIEX Option
    Wang, Chou-Wen
    Wu, Chin-Wen
    Chen, Po-Lin
    HCI IN BUSINESS, GOVERNMENT AND ORGANIZATIONS, PT II, HCIBGO 2023, 2023, 14039 : 214 - 224
  • [7] Efficient reinforcement learning in factored MDPs
    Kearns, M
    Koller, D
    IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 740 - 747
  • [8] Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    Sutton, RS
    Precup, D
    Singh, S
    ARTIFICIAL INTELLIGENCE, 1999, 112 (1-2) : 181 - 211
  • [9] Multitask reinforcement learning on the distribution of MDPs
    Tanaka, F
    Yamamura, M
    2003 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, VOLS I-III, PROCEEDINGS, 2003, : 1108 - 1113
  • [10] Expedited Learning in MDPs with Side Information
    Ornik, Melkior
    Fu, Jie
    Lauffer, Niklas T.
    Perera, W. K.
    Alshiekh, Mohammed
    Ono, Masahiro
    Topcu, Ufuk
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 1941 - 1948