Hierarchical Dialogue Optimization Using Semi-Markov Decision Processes

被引:0
|
作者
Cuayahuitl, Heriberto [1 ]
Renals, Steve [1 ]
Lemon, Oliver [2 ]
Shimodaira, Hiroshi [1 ]
机构
[1] Univ Edinburgh, CSTR, Sch Informat, 2 Buccleuch Pl, Edinburgh EH8 9LW, Midlothian, Scotland
[2] Univ Edinburgh, HCRC, Sch Informat, Edinburgh EH8 9LW, Midlothian, Scotland
关键词
Spoken dialogue systems; semi-Markov decision processes; hierarchical reinforcement learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of dialogue optimization on large search spaces. For such a purpose, in this paper we propose to learn dialogue strategies using multiple Semi-Markov Decision Processes and hierarchical reinforcement learning. This approach factorizes state variables and actions in order to learn a hierarchy of policies. Our experiments are based on a simulated flight booking dialogue system and compare flat versus hierarchical reinforcement learning. Experimental results show that the proposed approach produced a dramatic search space reduction (99.36%), and converged four orders of magnitude faster than flat reinforcement learning with a very small loss in optimality (on average 0.3 system turns). Results also report that the learnt policies outperformed a hand-crafted one under three different conditions of ASR confidence levels. This approach is appealing to dialogue optimization due to faster learning, reusable subsolutions, and scalability to larger problems.
引用
收藏
页码:1413 / +
页数:2
相关论文
共 50 条
  • [1] Hierarchical optimization of policy-coupled semi-Markov decision processes
    Wang, G
    Mahadevan, S
    [J]. MACHINE LEARNING, PROCEEDINGS, 1999, : 464 - 473
  • [2] Error bounds of optimization algorithms for semi-Markov decision processes
    Tang, Hao
    Yin, Baoqun
    Xi, Hongsheng
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2007, 38 (09) : 725 - 736
  • [3] Using Semi-Markov Chains to Solve Semi-Markov Processes
    Wu, Bei
    Maya, Brenda Ivette Garcia
    Limnios, Nikolaos
    [J]. METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2021, 23 (04) : 1419 - 1431
  • [4] Using Semi-Markov Chains to Solve Semi-Markov Processes
    Bei Wu
    Brenda Ivette Garcia Maya
    Nikolaos Limnios
    [J]. Methodology and Computing in Applied Probability, 2021, 23 : 1419 - 1431
  • [5] Optimum maintenance policy using semi-Markov decision processes
    Tomasevicz, Curtis L.
    Asgarpoor, Sohrab
    [J]. 2006 38TH ANNUAL NORTH AMERICAN POWER SYMPOSIUM, NAPS-2006 PROCEEDINGS, 2006, : 23 - +
  • [6] Correction to: Using Semi-Markov Chains to Solve Semi-Markov Processes
    Bei Wu
    Brenda Ivette Garcia Maya
    Nikolaos Limnios
    [J]. Methodology and Computing in Applied Probability, 2021, 23 (4) : 1433 - 1434
  • [7] Optimum maintenance policy using semi-Markov decision processes
    Tomasevicz, Curtis L.
    Asgarpoor, Sohrab
    [J]. ELECTRIC POWER SYSTEMS RESEARCH, 2009, 79 (09) : 1286 - 1291
  • [8] SEMI-MARKOV DECISION PROCESSES WITH UNBOUNDED REWARDS
    LIPPMAN, SA
    [J]. MANAGEMENT SCIENCE SERIES A-THEORY, 1973, 19 (07): : 717 - 731
  • [9] GENERALIZED SEMI-MARKOV DECISION-PROCESSES
    DOSHI, BT
    [J]. JOURNAL OF APPLIED PROBABILITY, 1979, 16 (03) : 618 - 630
  • [10] AVERAGE COST SEMI-MARKOV DECISION PROCESSES
    ROSS, SM
    [J]. JOURNAL OF APPLIED PROBABILITY, 1970, 7 (03) : 649 - &