Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization

被引:0
|
作者
Zhao, Yangyang [1 ,2 ]
Dastani, Mehdi [2 ]
Long, Jinchuan [3 ]
Wang, Zhenyu [4 ]
Wang, Shihan [2 ]
机构
[1] Changsha Univ Sci & Technol, Changsha, Peoples R China
[2] Univ Utrecht, Utrecht, Netherlands
[3] Cent South Univ, Changsha, Peoples R China
[4] South China Univ Technol, Guangzhou, Peoples R China
关键词
D O I
10.1162/tacl_a_00717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training a task-oriented dialogue policy using deep reinforcement learning is promising but requires extensive environment exploration. The amount of wasted invalid exploration makes policy learning inefficient. In this paper, we define and argue that dead-end states are important reasons for invalid exploration. When a conversation enters a dead-end state, regardless of the actions taken afterward, it will continue in a dead-end trajectory until the agent reaches a termination state or maximum turn. We propose a Dead-end Detection and Resurrection (DDR) method that detects dead-end states in an efficient manner and provides a rescue action to guide and correct the exploration direction. To prevent dialogue policies from repeating errors, DDR also performs dialogue data augmentation by adding relevant experiences that include dead-end states and penalties into the experience pool. We first validate the dead-end detection reliability and then demonstrate the effectiveness and generality of the method across various domains through experiments on four public dialogue datasets.
引用
收藏
页码:1578 / 1596
页数:19
相关论文
共 35 条
  • [31] End-to-End latent-variable task-oriented dialogue system with exact log-likelihood optimization
    Xu, Haotian
    Peng, Haiyun
    Xie, Haoran
    Cambria, Erik
    Zhou, Liuyang
    Zheng, Weiguo
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (03): : 1989 - 2002
  • [32] Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation
    Zhou, Mingyang
    Arnold, Josh
    Yu, Zhou
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 143 - 153
  • [33] Investigating the capabilities of large language model-based task-oriented dialogue chatbots from a learner's perspective
    Lee, Jang Ho
    Shin, Dongkwang
    Hwang, Yohan
    SYSTEM, 2024, 127
  • [34] DNN-Rule Hybrid Dyna-Q for Sample-Efficient Task-Oriented Dialog Policy Learning
    Zhang, Mingxin
    Shinozaki, Takahiro
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1431 - 1437
  • [35] From easy to hard: Improving personalized response generation of task-oriented dialogue systems by leveraging capacity in open-domain dialogues
    Zhao, Meng
    Wang, Lifang
    Jiang, Zejun
    Liu, Yushuang
    Li, Ronghan
    Hu, Zhongtian
    Lu, Xinyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 295