Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning

被引:16
|
作者
Saha, Tulika [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna, Bihar, India
来源
PLOS ONE | 2020年 / 15卷 / 07期
关键词
D O I
10.1371/journal.pone.0235367
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Purpose Developing a Dialogue/Virtual Agent (VA) that can handle complex tasks (need) of the user pertaining to multiple intents of a domain is challenging as it requires the agent to simultaneously deal with multiple subtasks. However, majority of these end-to-end dialogue systems incorporate only user semantics as inputs in the learning process and ignore other useful user behavior and information. Sentiment of the user at the time of conversation plays an important role in securing maximum user gratification. So, incorporating sentiment of the user during the policy learning becomes even more crucial, more so when serving composite tasks of the user. Methodology As a first step towards enabling the development of sentiment aided VA for multi-intent conversations, this paper proposes a new dataset, annotated with its corresponding intents, slot and sentiment (considering the entire dialogue history) labels, namedSentiVA, collected from open-sourced dialogue datasets. In order to integrate these multiple aspects, a Hierarchical Reinforcement Learning (HRL) specificallyoptionsbased VA is proposed to learn strategies for managing multi-intent conversations. Along with task success based immediate rewards, sentiment based immediate rewards are also incorporated in the hierarchical value functions to make the VA user adaptive. Findings Empirically, the paper shows that task based and sentiment based immediate rewards cumulatively are required to ensure successful task completion and attain maximum user satisfaction in a multi-intent scenario instead of any of these rewards alone. Practical implications The eventual evaluators and consumers of dialogue systems are users. Thus, to ensure a fulfilling conversational experience involving maximum user satisfaction requires VA to consider user sentiment at every time-step in its decision making policy. Originality This work is the first attempt in incorporating sentiment based rewards in the HRL framework.
引用
收藏
页数:28
相关论文
共 50 条
  • [1] A hierarchical approach for efficient multi-intent dialogue policy learning
    Saha, Tulika
    Gupta, Dhawal
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35025 - 35050
  • [2] A hierarchical approach for efficient multi-intent dialogue policy learning
    Tulika Saha
    Dhawal Gupta
    Sriparna Saha
    Pushpak Bhattacharyya
    [J]. Multimedia Tools and Applications, 2021, 80 : 35025 - 35050
  • [3] A Unified Dialogue Management Strategy for Multi-intent Dialogue Conversations in Multiple Languages
    Saha, Tulika
    Gupta, Dhawal
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (06)
  • [4] Towards integrated dialogue policy learning for multiple domains and intents using Hierarchical Deep Reinforcement Learning
    Saha, Tulika
    Gupta, Dhawal
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 162
  • [5] Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy
    Rohmatillah, Mahdin
    Chien, Jen-Tzung
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 748 - 761
  • [6] Towards Sentiment-Aware Multi-Modal Dialogue Policy Learning
    Tulika Saha
    Sriparna Saha
    Pushpak Bhattacharyya
    [J]. Cognitive Computation, 2022, 14 : 246 - 260
  • [7] Towards Sentiment-Aware Multi-Modal Dialogue Policy Learning
    Saha, Tulika
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. COGNITIVE COMPUTATION, 2022, 14 (01) : 246 - 260
  • [8] A multi-intent based multi-policy relay contrastive learning for sequential recommendation
    Di, Weiqiang
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [9] Multi-intent autonomous decision-making for air combat with deep reinforcement learning
    Jia, Luyu
    Cai, Chengtao
    Wang, Xingmei
    Ding, Zhengkun
    Xu, Junzheng
    Wu, Kejun
    Liu, Jiaqi
    [J]. APPLIED INTELLIGENCE, 2023, 53 (23) : 29076 - 29093
  • [10] A multi-intent based multi-policy relay contrastive learning for sequential recommendation
    Di, Weiqiang
    [J]. PeerJ Computer Science, 2022, 8