Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization

被引:0
|
作者
Zhao, Yangyang [1 ,2 ]
Dastani, Mehdi [2 ]
Long, Jinchuan [3 ]
Wang, Zhenyu [4 ]
Wang, Shihan [2 ]
机构
[1] Changsha Univ Sci & Technol, Changsha, Peoples R China
[2] Univ Utrecht, Utrecht, Netherlands
[3] Cent South Univ, Changsha, Peoples R China
[4] South China Univ Technol, Guangzhou, Peoples R China
关键词
D O I
10.1162/tacl_a_00717
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training a task-oriented dialogue policy using deep reinforcement learning is promising but requires extensive environment exploration. The amount of wasted invalid exploration makes policy learning inefficient. In this paper, we define and argue that dead-end states are important reasons for invalid exploration. When a conversation enters a dead-end state, regardless of the actions taken afterward, it will continue in a dead-end trajectory until the agent reaches a termination state or maximum turn. We propose a Dead-end Detection and Resurrection (DDR) method that detects dead-end states in an efficient manner and provides a rescue action to guide and correct the exploration direction. To prevent dialogue policies from repeating errors, DDR also performs dialogue data augmentation by adding relevant experiences that include dead-end states and penalties into the experience pool. We first validate the dead-end detection reliability and then demonstrate the effectiveness and generality of the method across various domains through experiments on four public dialogue datasets.
引用
收藏
页码:1578 / 1596
页数:19
相关论文
共 35 条
  • [1] Model discrepancy policy optimization for task-oriented dialogue
    Zhou, Zhenyou
    Liu, Zhibin
    Dong, Zhaoan
    Liu, Yuhan
    COMPUTER SPEECH AND LANGUAGE, 2024, 87
  • [2] DORA: Towards policy optimization for task-oriented dialogue system with efficient context
    Jeon, Hyunmin
    Lee, Gary Geunbae
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [3] DORA: Towards policy optimization for task-oriented dialogue system with efficient context
    Jeon, Hyunmin
    Lee, Gary Geunbae
    Computer Speech and Language, 2022, 72
  • [4] Advances and Challenges in Multi-Domain Task-Oriented Dialogue Policy Optimization
    Rohmatillah, Mahdin
    Chien, Jen-Tzung
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2023, 12 (01)
  • [5] Budgeted Policy Learning for Task-Oriented Dialogue Systems
    Zhang, Zhirui
    Li, Xiujun
    Gao, Jianfeng
    Chen, Enhong
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3742 - 3751
  • [6] An emotion-sensitive dialogue policy for task-oriented dialogue system
    Zhu, Hui
    Wang, Xv
    Wang, Zhenyu
    Xv, Kai
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [7] BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems
    Lipton, Zachary
    Li, Xiujun
    Gao, Jianfeng
    Li, Lihong
    Ahmed, Faisal
    Deng, Li
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5237 - 5244
  • [8] Domain Complexity and Policy Learning in Task-Oriented Dialogue Systems
    Papangelis, Alexandros
    Ultes, Stefan
    Stylianou, Yannis
    ADVANCED SOCIAL INTERACTION WITH AGENTS, 2019, 510 : 63 - 69
  • [9] Cold-started Curriculum Learning for Task-oriented Dialogue Policy
    Zhu, Hui
    Zhao, Yangyang
    Qin, Hua
    2021 IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2021), 2021, : 100 - 105
  • [10] "HOW ROBUST R U?": EVALUATING TASK-ORIENTED DIALOGUE SYSTEMS ON SPOKEN CONVERSATIONS
    Kim, Seokhwan
    Liu, Yang
    Fin, Di
    Papangelis, Alexandros
    Gopalakrishnan, Karthik
    Hedayatnia, Behnam
    Hakkani-Tur, Dilek
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1147 - 1154