Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy

被引:7
|
作者
Rohmatillah, Mahdin [1 ]
Chien, Jen-Tzung [2 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, EECS Int Grad Program, Hsinchu 30010, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu 30010, Taiwan
关键词
Task analysis; Optimization; Training; Pipelines; Reinforcement learning; Costs; Transformers; Dialogue system; policy optimization; guidance learning; hierarchical reinforcement learning;
D O I
10.1109/TASLP.2023.3235202
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Achieving high performance in a multi-domain dialogue system with low computation is undoubtedly challenging. Previous works applying an end-to-end approach have been very successful. However, the computational cost remains a major issue since the large-sized language model using GPT-2 is required. Meanwhile, the optimization for individual components in the dialogue system has not shown promising result, especially for the component of dialogue management due to the complexity of multi-domain state and action representation. To cope with these issues, this article presents an efficient guidance learning where the imitation learning and the hierarchical reinforcement learning (HRL) with human-in-the-loop are performed to achieve high performance via an inexpensive dialogue agent. The behavior cloning with auxiliary tasks is exploited to identify the important features in latent representation. In particular, the proposed HRL is designed to treat each goal of a dialogue with the corresponding sub-policy so as to provide efficient dialogue policy learning by utilizing the guidance from human through action pruning and action evaluation, as well as the reward obtained from the interaction with the simulated user in the environment. Experimental results on ConvLab-2 framework show that the proposed method achieves state-of-the-art performance in dialogue policy optimization and outperforms the GPT-2 based solutions in end-to-end system evaluation.
引用
收藏
页码:748 / 761
页数:14
相关论文
共 50 条
  • [1] Scaling Up Deep Reinforcement Learning for Multi-Domain Dialogue Systems
    Cuayahuitl, Heriberto
    Yu, Seunghak
    Williamson, Ashley
    Carse, Jacob
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3339 - 3346
  • [2] Robust Multi-Domain Multi-Turn Dialogue Policy via Student-Teacher Offline Reinforcement Learning
    Rohmatillah, Mahdin
    Chien, Jen-Tzung
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2024, 13 (01)
  • [3] A HIERARCHICAL TRACKER FOR MULTI-DOMAIN DIALOGUE STATE TRACKING
    Li, Jieyu
    Zhu, Su
    Yu, Kai
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8014 - 8018
  • [4] MULTI-DOMAIN DIALOGUE SUCCESS CLASSIFIERS FOR POLICY TRAINING
    Vandyke, David
    Su, Pei-Hao
    Gasic, Milica
    Mrksic, Nikola
    Wen, Tsung-Hsien
    Young, Steve
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 763 - 770
  • [5] Multi-Domain Dialogue State Tracking with Hierarchical Task Graph
    Shen, Tianhao
    Wang, Xiaojie
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Hierarchical Reinforcement Learning in Multi-Domain Elastic Optical Networks to Realize Joint RMSA
    Xu, Liufei
    Huang, Yue-Cai
    Xue, Yun
    Hu, Xiaohui
    [J]. JOURNAL OF LIGHTWAVE TECHNOLOGY, 2023, 41 (08) : 2276 - 2288
  • [7] Causal Confusion Reduction for Robust Multi-Domain Dialogue Policy
    Rohmatillah, Mahdin
    Chien, Jen-Tzung
    [J]. INTERSPEECH 2021, 2021, : 3221 - 3225
  • [8] POLICY COMMITTEE FOR ADAPTATION IN MULTI-DOMAIN SPOKEN DIALOGUE SYSTEMS
    Gasic, M.
    Mrksic, N.
    Su, P-H.
    Vandyke, D.
    Wen, T-H.
    Young, S.
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 806 - 812
  • [9] Cognitive Heterogeneous Multi-Domain Networks with Hierarchical Learning
    Ben Yoo, S. J.
    [J]. 2018 IEEE PHOTONICS SOCIETY SUMMER TOPICAL MEETING SERIES (SUM), 2018,
  • [10] Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning
    Saha, Tulika
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. PLOS ONE, 2020, 15 (07):