Learning option MDPs from small data

被引：0

作者：

Zehfroosh, Ashkan ^{[1
]}

Tanner, Herbert G. ^{[1
]}

Heinz, Jeffrey ^{[2
]}

机构：

[1] Univ Delaware, Dept Mech Engn, Newark, DE 19716 USA

[2] SUNY Stony Brook, Dept Linguist & Insti Tute Adv Computat Sci, Stony Brook, NY 11794 USA

来源：

2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC) | 2018年

关键词：

ACQUISITION; INFANTS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning from small data is a challenge that presents itself in applications of human-robot interaction (HRI) in the context of pediatric rehabilitation. Discrete models of computation such as an Markov decision process (MDP) can be used to capture the dynamics of HRI, but the parameters of those models are usually unknown and (human) subject dependent. This paper combines an abstraction method for MDPs, with a parameter estimation method originally developed for natural language processing, designed specifically to operate on small data. The combination expedites learning from small data and offers more accurate models that lend themselves to more effective decision-making. Numerical evidence in support of the approach is offered in a comparative study on a small grid-world example.

引用

页码：252 / 257

页数：6

共 50 条

[41] Near-optimal Reinforcement Learning in Factored MDPs
Osband, Ian
Van Roy, Benjamin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[42] Reinforcement learning for MDPs using temporal difference schemes
Thomas, A
Marcus, SI
PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 1997, : 577 - 583
[43] Path Consistency Learning in Tsallis Entropy Regularized MDPs
Nachum, Ofir
Chow, Yinlam
Ghavamzadeh, Mohamamd
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[44] Learning models of relational MDPs using graph kernels
Halbritter, Florian
Geibel, Peter
MICAI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2007, 4827 : 409 - +
[45] Exploiting Additive Structure in Factored MDPs for Reinforcement Learning
Degris, Thomas
Sigaud, Olivier
Wuillemin, Pierre-Henri
RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 15 - 26
[46] Active Learning from Crowds with Unsure Option
Zhong, Jinhong
Tang, Ke
Zhou, Zhi-Hua
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1061 - 1067
[47] States evolution in Θ(λ)-learning based on logical MDPs with negation
Song Zhiwei
Chen Xiaoping
2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 2345 - 2350
[48] Learning in Online MDPs: Is there a Price for Handling the Communicating Case?
Chandrasekaran, Gautam
Tewari, Ambuj
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 293 - 302
[49] Planning and Learning for Decentralized MDPs with Event Driven Rewards
Gupta, Tarun
Kumar, Akshat
Paruchuri, Praveen
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6186 - 6194
[50] Inferring financial bubbles from option data
Jarrow, Robert A.
Kwok, Simon S.
JOURNAL OF APPLIED ECONOMETRICS, 2021, : 1013 - 1046

← 1 2 3 4 5 →