Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy

被引:7
|
作者
Rohmatillah, Mahdin [1 ]
Chien, Jen-Tzung [2 ]
机构
[1] Natl Yang Ming Chiao Tung Univ, EECS Int Grad Program, Hsinchu 30010, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Inst Elect & Comp Engn, Hsinchu 30010, Taiwan
关键词
Task analysis; Optimization; Training; Pipelines; Reinforcement learning; Costs; Transformers; Dialogue system; policy optimization; guidance learning; hierarchical reinforcement learning;
D O I
10.1109/TASLP.2023.3235202
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Achieving high performance in a multi-domain dialogue system with low computation is undoubtedly challenging. Previous works applying an end-to-end approach have been very successful. However, the computational cost remains a major issue since the large-sized language model using GPT-2 is required. Meanwhile, the optimization for individual components in the dialogue system has not shown promising result, especially for the component of dialogue management due to the complexity of multi-domain state and action representation. To cope with these issues, this article presents an efficient guidance learning where the imitation learning and the hierarchical reinforcement learning (HRL) with human-in-the-loop are performed to achieve high performance via an inexpensive dialogue agent. The behavior cloning with auxiliary tasks is exploited to identify the important features in latent representation. In particular, the proposed HRL is designed to treat each goal of a dialogue with the corresponding sub-policy so as to provide efficient dialogue policy learning by utilizing the guidance from human through action pruning and action evaluation, as well as the reward obtained from the interaction with the simulated user in the environment. Experimental results on ConvLab-2 framework show that the proposed method achieves state-of-the-art performance in dialogue policy optimization and outperforms the GPT-2 based solutions in end-to-end system evaluation.
引用
收藏
页码:748 / 761
页数:14
相关论文
共 50 条
  • [41] Hierarchical reinforcement learning guidance with threat avoidance
    Li Bohao
    Wu Yunjie
    Li Guofei
    [J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2022, 33 (05) : 1173 - 1185
  • [42] Hierarchical reinforcement learning guidance with threat avoidance
    LI Bohao
    WU Yunjie
    LI Guofei
    [J]. Journal of Systems Engineering and Electronics, 2022, 33 (05) : 1173 - 1185
  • [43] Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning
    Saha, Tulika
    Gupta, Dhawal
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 162
  • [44] Integrating topic estimation and dialogue history for domain selection in multi-domain spoken dialogue systems
    Ikeda, Satoshi
    Komatani, Kazunori
    Ogata, Tetsuya
    Okuno, Hiroshi G.
    [J]. NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 294 - 304
  • [45] A Privacy-Preserving Reinforcement Learning Algorithm for Multi-Domain Virtual Network Embedding
    Andreoletti, Davide
    Velichkova, Tanya
    Verticale, Giacomo
    Tornatore, Massimo
    Giordano, Silvia
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2020, 17 (04): : 2291 - 2304
  • [46] A Privacy-Preserving Reinforcement Learning Algorithm for Multi-Domain Virtual Network Embedding
    Andreoletti, Davide
    Velichkova, Tanya
    Verticale, Giacomo
    Tornatore, Massimo
    Giordano, Silvia
    [J]. IEEE Transactions on Network and Service Management, 2020, 17 (04): : 2291 - 2304
  • [47] Reinforcement learning of dialogue strategies with hierarchical abstract machines
    Cuayahuitl, Heriberto
    Renals, Steve
    Lemon, Oliver
    Shimodaira, Hiroshi
    [J]. 2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 182 - +
  • [48] Evaluation of a hierarchical reinforcement learning spoken dialogue system
    Cuayahuitl, Heriberto
    Renals, Steve
    Lemon, Oliver
    Shimodaira, Hiroshi
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 395 - 429
  • [49] Multi-Domain Active Learning for Recommendation
    Zhang, Zihan
    Jin, Xiaoming
    Li, Lianghao
    Ding, Guiguang
    Yang, Qiang
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2358 - 2364
  • [50] Hierarchical Multi-Agent Deep Reinforcement Learning with an Attention-based Graph Matching Approach for Multi-Domain VNF-FG Embedding
    Slim, Lotfi
    Bannour, Fetia
    [J]. IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 2105 - 2110