Budgeted Policy Learning for Task-Oriented Dialogue Systems

被引:0
|
作者
Zhang, Zhirui [1 ]
Li, Xiujun [2 ,3 ]
Gao, Jianfeng [2 ]
Chen, Enhong [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Microsoft Res AI, Redmond, WA USA
[3] Univ Washington, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new approach that extends Deep Dyna-Q (DDQ) by incorporating a Budget-Conscious Scheduling (BCS) to best utilize a fixed, small amount of user interactions (budget) for learning task-oriented dialogue agents. BCS consists of (1) a Poisson-based global scheduler to allocate budget over different stages of training; (2) a controller to decide at each training step whether the agent is trained using real or simulated experiences; (3) a user goal sampling module to generate the experiences that are most effective for policy learning. Experiments on a movie-ticket booking task with simulated and real users show that our approach leads to significant improvements in success rate over the state-of-the-art baselines given the fixed budget.
引用
收藏
页码:3742 / 3751
页数:10
相关论文
共 50 条
  • [1] Domain Complexity and Policy Learning in Task-Oriented Dialogue Systems
    Papangelis, Alexandros
    Ultes, Stefan
    Stylianou, Yannis
    [J]. ADVANCED SOCIAL INTERACTION WITH AGENTS, 2019, 510 : 63 - 69
  • [2] Continual Learning in Task-Oriented Dialogue Systems
    Madotto, Andrea
    Lin, Zhaojiang
    Zhou, Zhenpeng
    Moon, Seungwhan
    Crook, Paul
    Liu, Bing
    Yu, Zhou
    Cho, Eunjoon
    Fung, Pascale
    Wang, Zhiguang
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7452 - 7467
  • [3] A Survey on Task-Oriented Dialogue Systems
    任务型对话系统研究综述
    [J]. Wang, Zhen-Yu (wangzy@scut.edu.cn), 1862, Science Press (43): : 1862 - 1896
  • [4] MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
    Lin, Zhaojiang
    Madotto, Andrea
    Winata, Genta Indra
    Fung, Pascale
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3391 - 3405
  • [5] Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems
    Li, Ziming
    Kiseleva, Julia
    de Rijke, Maarten
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [6] Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems
    Madotto, Andrea
    Cahyawijaya, Samuel
    Winata, Genta Indra
    Xu, Yan
    Liu, Zihan
    Lin, Zhaojiang
    Fung, Pascale
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2372 - 2394
  • [7] Incremental Learning from Scratch for Task-Oriented Dialogue Systems
    Wang, Weikang
    Zhang, Jiajun
    Li, Qian
    Hwang, Mei-Yuh
    Zong, Chengqing
    Li, Zhifei
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3710 - 3720
  • [8] Cold-started Curriculum Learning for Task-oriented Dialogue Policy
    Zhu, Hui
    Zhao, Yangyang
    Qin, Hua
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE 2021), 2021, : 100 - 105
  • [9] Intent Disambiguation for Task-oriented Dialogue Systems
    Alfieri, Andrea
    Wolter, Ralf
    Hashemi, Seyyed Hadi
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 5079 - 5080
  • [10] Evaluating Task-oriented Dialogue Systems with Users
    Siro, Clemencia
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3495 - 3495