Task-based dialogue policy learning based on diffusion models

被引:0
|
作者
Liu, Zhibin [1 ]
Pang, Rucai [1 ]
Dong, Zhaoan [1 ]
机构
[1] Qufu Normal Univ, Sch Comp Sci, Yantai Rd, Rizhao 276826, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-domain dialogue; Reinforcement learning; Reward estimation; Behavioural cloning; Diffusion models;
D O I
10.1007/s10489-024-05810-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of task-based dialogue systems is to help users achieve their dialogue needs using as few dialogue rounds as possible. As the demand increases, the dialogue tasks gradually involve multiple domains and develop in the direction of complexity and diversity. Achieving high performance with low computational effort has become an essential metric for multi-domain task-based dialogue systems. This paper proposes a new approach to guided dialogue policy. The method introduces a conditional diffusion model in the reinforcement learning Q-learning algorithm to regularise the policy in a diffusion Q-learning manner. The conditional diffusion model is used to learn the action value function, regulate the actions using regularisation, sample the actions, use the sampled actions in the policy update process, and additionally add a loss term that maximizes the value of the actions in the policy update process to improve the learning efficiency. Our proposed method is based on a conditional diffusion model, combined with the reinforcement learning TD3 algorithm as a dialogue policy and an inverse reinforcement learning approach to construct a reward estimator to provide rewards for policy updates as a way of completing a multi-domain dialogue task.
引用
收藏
页码:11752 / 11764
页数:13
相关论文
共 50 条
  • [31] Task-Based Language Learning and Teaching with Technology
    Ranalli, Jim
    LANGUAGE LEARNING & TECHNOLOGY, 2011, 15 (03): : 32 - 36
  • [32] Task-Based Language Learning and Teaching with Technology
    Arslanyilmaz, Abdurrahman
    CALICO JOURNAL, 2014, 31 (01): : 134 - 136
  • [33] Toward Humanlike Task-Based Dialogue Processing for Human Robot Interaction
    Scheutz, Matthias
    Cantrell, Rehj
    Schermerhorn, Paul
    AI MAGAZINE, 2011, 32 (04) : 77 - 84
  • [34] LOTUS: Learning to Optimize Task-Based US Representations
    Velikova, Yordanka
    Azampour, Mohammad Farid
    Simson, Walter
    Duque, Vanessa Gonzalez
    Navab, Nassir
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 435 - 445
  • [35] Cooperative Learning in the Task-Based Class English Teaching
    刘叶
    校园英语, 2015, (15) : 97 - 97
  • [36] Task-Based Second Language Learning Game System
    Hoshino, Jun'ichi
    Saito, Tetsuya
    Kazuto, Shiratori
    ENTERTAINMENT COMPUTING - ICEC 2009, 2009, 5709 : 323 - 324
  • [37] Task-based learning (TBL) in undergraduate medical education
    Virjo, I
    Holmberg-Marttila, D
    Mattila, K
    MEDICAL TEACHER, 2001, 23 (01) : 55 - 58
  • [38] Task-Based Language Teaching and Expansive Learning Theory
    Robertson, Margaret
    TESL CANADA JOURNAL, 2014, 31 : 187 - 198
  • [39] Enhancing automaticity through task-based language learning
    De Ridder, Isabelle
    Vangehuchten, Lieve
    Gomez, Marta Sesena
    APPLIED LINGUISTICS, 2007, 28 (02) : 309 - 315
  • [40] Exploring task-based PBL in Chinese teaching and learning
    Wang, Danping
    JOURNAL OF EDUCATION FOR TEACHING, 2013, 39 (05) : 606 - 608