Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization

被引:0
|
作者
Zhao, Yangyang [1 ,2 ]
Dastani, Mehdi [2 ]
Long, Jinchuan [3 ]
Wang, Zhenyu [4 ]
Wang, Shihan [2 ]
机构
[1] Changsha Univ Sci & Technol, Changsha, Peoples R China
[2] Univ Utrecht, Utrecht, Netherlands
[3] Cent South Univ, Changsha, Peoples R China
[4] South China Univ Technol, Guangzhou, Peoples R China
关键词
D O I
10.1162/tacl_a_00717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training a task-oriented dialogue policy using deep reinforcement learning is promising but requires extensive environment exploration. The amount of wasted invalid exploration makes policy learning inefficient. In this paper, we define and argue that dead-end states are important reasons for invalid exploration. When a conversation enters a dead-end state, regardless of the actions taken afterward, it will continue in a dead-end trajectory until the agent reaches a termination state or maximum turn. We propose a Dead-end Detection and Resurrection (DDR) method that detects dead-end states in an efficient manner and provides a rescue action to guide and correct the exploration direction. To prevent dialogue policies from repeating errors, DDR also performs dialogue data augmentation by adding relevant experiences that include dead-end states and penalties into the experience pool. We first validate the dead-end detection reliability and then demonstrate the effectiveness and generality of the method across various domains through experiments on four public dialogue datasets.
引用
收藏
页码:1578 / 1596
页数:19
相关论文
共 35 条
  • [21] A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-oriented Dialogue Policy Learning
    Kwan, Wai-Chung
    Wang, Hong-Ru
    Wang, Hui-Min
    Wong, Kam-Fai
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (03) : 318 - 334
  • [22] Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue
    Balakrishnan, Anusha
    Rao, Jinfeng
    Upasani, Kartikeya
    White, Michael
    Subba, Rajen
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 831 - 844
  • [23] AirConcierge: Generating Task-Oriented Dialogue via Efficient Large-Scale Knowledge Retrieval
    Chen, Chieh-Yang
    Wang, Pei-Hsin
    Chang, Shih-Chieh
    Juan, Da-Cheng
    Wei, Wei
    Pan, Jia-Yu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 884 - 897
  • [24] Transfer Learning based Task-oriented Dialogue Policy for Multiple Domains using Hierarchical Reinforcement Learning
    Saha, Tulika
    Saha, Sriparna
    Bhattacharyya, Pushpak
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [25] Amalgamating Knowledge from Two Teachers for Task-oriented Dialogue System with Adversarial Training
    He, Wanwei
    Yang, Min
    Yan, Rui
    Li, Chengming
    Shen, Ying
    Xu, Ruifeng
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3498 - 3507
  • [26] A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning
    Niu, Xuecheng
    Ito, Akinori
    Nose, Takashi
    IEEE ACCESS, 2024, 12 : 142640 - 142650
  • [27] From Chatter to Matter: Addressing Critical Steps of Emotion Recognition Learning in Task-oriented Dialogue
    Feng, Shutong
    Lubis, Nurul
    Ruppik, Benjamin
    Geishauser, Christian
    Heck, Michael
    Lin, Hsien-chin
    van Niekerk, Carel
    Vukovic, Renato
    Gasic, Milica
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 85 - 103
  • [28] From Retrieval to Generation: A Simple and Unified Generative Model for End-to-End Task-Oriented Dialogue
    Ding, Zeyuan
    Yang, Zhihao
    Luo, Ling
    Sun, Yuanyuan
    Lin, Hongfei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17907 - 17914
  • [29] The Timing of Utterance Planning in Task-Oriented Dialogue: Evidence from a Novel List-Completion Paradigm
    Barthel, Mathias
    Sauppe, Sebastian
    Levinson, Stephen C.
    Meyer, Antje S.
    FRONTIERS IN PSYCHOLOGY, 2016, 7
  • [30] End-to-End latent-variable task-oriented dialogue system with exact log-likelihood optimization
    Haotian Xu
    Haiyun Peng
    Haoran Xie
    Erik Cambria
    Liuyang Zhou
    Weiguo Zheng
    World Wide Web, 2020, 23 : 1989 - 2002