Learning option MDPs from small data

被引:0
|
作者
Zehfroosh, Ashkan [1 ]
Tanner, Herbert G. [1 ]
Heinz, Jeffrey [2 ]
机构
[1] Univ Delaware, Dept Mech Engn, Newark, DE 19716 USA
[2] SUNY Stony Brook, Dept Linguist & Insti Tute Adv Computat Sci, Stony Brook, NY 11794 USA
来源
2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC) | 2018年
关键词
ACQUISITION; INFANTS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning from small data is a challenge that presents itself in applications of human-robot interaction (HRI) in the context of pediatric rehabilitation. Discrete models of computation such as an Markov decision process (MDP) can be used to capture the dynamics of HRI, but the parameters of those models are usually unknown and (human) subject dependent. This paper combines an abstraction method for MDPs, with a parameter estimation method originally developed for natural language processing, designed specifically to operate on small data. The combination expedites learning from small data and offers more accurate models that lend themselves to more effective decision-making. Numerical evidence in support of the approach is offered in a comparative study on a small grid-world example.
引用
收藏
页码:252 / 257
页数:6
相关论文
共 50 条
  • [41] Near-optimal Reinforcement Learning in Factored MDPs
    Osband, Ian
    Van Roy, Benjamin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [42] Reinforcement learning for MDPs using temporal difference schemes
    Thomas, A
    Marcus, SI
    PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 1997, : 577 - 583
  • [43] Path Consistency Learning in Tsallis Entropy Regularized MDPs
    Nachum, Ofir
    Chow, Yinlam
    Ghavamzadeh, Mohamamd
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [44] Learning models of relational MDPs using graph kernels
    Halbritter, Florian
    Geibel, Peter
    MICAI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2007, 4827 : 409 - +
  • [45] Exploiting Additive Structure in Factored MDPs for Reinforcement Learning
    Degris, Thomas
    Sigaud, Olivier
    Wuillemin, Pierre-Henri
    RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 15 - 26
  • [46] Active Learning from Crowds with Unsure Option
    Zhong, Jinhong
    Tang, Ke
    Zhou, Zhi-Hua
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1061 - 1067
  • [47] States evolution in Θ(λ)-learning based on logical MDPs with negation
    Song Zhiwei
    Chen Xiaoping
    2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 2345 - 2350
  • [48] Learning in Online MDPs: Is there a Price for Handling the Communicating Case?
    Chandrasekaran, Gautam
    Tewari, Ambuj
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 293 - 302
  • [49] Planning and Learning for Decentralized MDPs with Event Driven Rewards
    Gupta, Tarun
    Kumar, Akshat
    Paruchuri, Praveen
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6186 - 6194
  • [50] Inferring financial bubbles from option data
    Jarrow, Robert A.
    Kwok, Simon S.
    JOURNAL OF APPLIED ECONOMETRICS, 2021, : 1013 - 1046