Hierarchical Dialogue Optimization Using Semi-Markov Decision Processes

被引：0

作者：

Cuayahuitl, Heriberto ^{[1
]}

Renals, Steve ^{[1
]}

Lemon, Oliver ^{[2
]}

Shimodaira, Hiroshi ^{[1
]}

机构：

[1] Univ Edinburgh, CSTR, Sch Informat, 2 Buccleuch Pl, Edinburgh EH8 9LW, Midlothian, Scotland

[2] Univ Edinburgh, HCRC, Sch Informat, Edinburgh EH8 9LW, Midlothian, Scotland

来源：

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 | 2007年

关键词：

Spoken dialogue systems; semi-Markov decision processes; hierarchical reinforcement learning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper addresses the problem of dialogue optimization on large search spaces. For such a purpose, in this paper we propose to learn dialogue strategies using multiple Semi-Markov Decision Processes and hierarchical reinforcement learning. This approach factorizes state variables and actions in order to learn a hierarchy of policies. Our experiments are based on a simulated flight booking dialogue system and compare flat versus hierarchical reinforcement learning. Experimental results show that the proposed approach produced a dramatic search space reduction (99.36%), and converged four orders of magnitude faster than flat reinforcement learning with a very small loss in optimality (on average 0.3 system turns). Results also report that the learnt policies outperformed a hand-crafted one under three different conditions of ASR confidence levels. This approach is appealing to dialogue optimization due to faster learning, reusable subsolutions, and scalability to larger problems.

引用

页码：1413 / +

页数：2

共 50 条

[1] Hierarchical optimization of policy-coupled semi-Markov decision processes
Wang, G
Mahadevan, S
[J]. MACHINE LEARNING, PROCEEDINGS, 1999, : 464 - 473
[2] Error bounds of optimization algorithms for semi-Markov decision processes
Tang, Hao
Yin, Baoqun
Xi, Hongsheng
[J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2007, 38 (09) : 725 - 736
[3] Using Semi-Markov Chains to Solve Semi-Markov Processes
Wu, Bei
Maya, Brenda Ivette Garcia
Limnios, Nikolaos
[J]. METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2021, 23 (04) : 1419 - 1431
[4] Using Semi-Markov Chains to Solve Semi-Markov Processes
Bei Wu
Brenda Ivette Garcia Maya
Nikolaos Limnios
[J]. Methodology and Computing in Applied Probability, 2021, 23 : 1419 - 1431
[5] Optimum maintenance policy using semi-Markov decision processes
Tomasevicz, Curtis L.
Asgarpoor, Sohrab
[J]. 2006 38TH ANNUAL NORTH AMERICAN POWER SYMPOSIUM, NAPS-2006 PROCEEDINGS, 2006, : 23 - +
[6] Correction to: Using Semi-Markov Chains to Solve Semi-Markov Processes
Bei Wu
Brenda Ivette Garcia Maya
Nikolaos Limnios
[J]. Methodology and Computing in Applied Probability, 2021, 23 (4) : 1433 - 1434
[7] Optimum maintenance policy using semi-Markov decision processes
Tomasevicz, Curtis L.
Asgarpoor, Sohrab
[J]. ELECTRIC POWER SYSTEMS RESEARCH, 2009, 79 (09) : 1286 - 1291
[8] SEMI-MARKOV DECISION PROCESSES WITH UNBOUNDED REWARDS
LIPPMAN, SA
[J]. MANAGEMENT SCIENCE SERIES A-THEORY, 1973, 19 (07): : 717 - 731
[9] GENERALIZED SEMI-MARKOV DECISION-PROCESSES
DOSHI, BT
[J]. JOURNAL OF APPLIED PROBABILITY, 1979, 16 (03) : 618 - 630
[10] AVERAGE COST SEMI-MARKOV DECISION PROCESSES
ROSS, SM
[J]. JOURNAL OF APPLIED PROBABILITY, 1970, 7 (03) : 649 - &

← 1 2 3 4 5 →