POEM: A Personalized Online Education Scheme Based on Reinforcement Learning

被引:2
|
作者
Wang, Yufeng [1 ]
Cai, Wenjie [1 ]
Chen, Meijuan [1 ]
Shen, Jianhua [1 ]
机构
[1] Nanjing Univ Posts & Telecomm, Coll Telecommun & Informat Engn, Nanjing, Peoples R China
关键词
personalized education; reinforcement learning; Gaussian process; multi-armed bandit; zone of proximal development (ZPD);
D O I
10.1109/TALE48869.2020.9368369
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
As online e-learning systems become more prevalent, there is a growing need for them to accommodate individual differences among students. According to the concept of zone of proximal development (ZPD), it is imperative to provide online students with educational contents that are neither too easy nor too difficult, but are slightly beyond their current abilities. However, following ZPD rule is challenging in online e-learning system, due to the following reasons: the system does not know a priori the ability of the online students, especially for the newly arrived student; the exact relationship between student feedback on teaching and their abilities (i.e., reward/gain function) is extremely complicated, and even unknown to each student. Aiming at solving the issue above, this paper proposes a personalized educational scheme to students, POEM, in order to maximize their accumulative learning gains over multiple rounds. Specifically, instead of assuming any specific formal reward function, we first estimate any unknown reward function from noisy samples using Gaussian process (GP) model. Then, the multi-arm bandit based algorithm is used to select the teaching content with the adaptive difficulty level to balance the effect of exploration and exploitation. The simulation results demonstrate the effectiveness of our proposed method.
引用
收藏
页码:474 / 481
页数:8
相关论文
共 50 条