Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

被引:4
|
作者
Chen, Xiaolin [1 ]
Song, Xuemeng [2 ]
Jing, Liqiang [2 ]
Li, Shuo [2 ]
Hu, Linmei [3 ]
Nie, Liqiang [4 ]
机构
[1] Shandong Univ, Sch Software, Joint SDU NTU Ctr Artificial Intelligence Res, Jinan, Peoples R China
[2] Shandong Univ, Sch Comp Sci & Technol, Jinan, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[4] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal task-oriented dialog systems; text response generation; generative pretrained language model; dual knowledge selection;
D O I
10.1145/3606368
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: (1) overlook the benefit of generative pretraining and (2) ignore the textual context-related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language mode for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly, integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] KELLM: Knowledge-Enhanced Label-Wise Large Language Model for Safe and Interpretable Drug Recommendation
    Xu, Tianhan
    Li, Bin
    ELECTRONICS, 2025, 14 (01):
  • [42] Knowledge-Enhanced Causal Reinforcement Learning Model for Interactive Recommendation
    Nie, Weizhi
    Wen, Xin
    Liu, Jing
    Chen, Jiawei
    Wu, Jiancan
    Jin, Guoqing
    Lu, Jing
    Liu, An-An
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1129 - 1142
  • [43] Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation
    Shen, Xudong
    Huang, Xianying
    Zou, Shihao
    Gan, Xinyi
    NEUROCOMPUTING, 2024, 582
  • [44] Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation
    Shen, Xudong
    Huang, Xianying
    Zou, Shihao
    Gan, Xinyi
    Neurocomputing, 2024, 582
  • [45] Explicit and implicit knowledge-enhanced model for event causality identification
    Chen, Siyuan
    Mao, Kezhi
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [46] Lorentz equivariant model for knowledge-enhanced hyperbolic collaborative filtering
    Huang, Bosong
    Yu, Weihao
    Xie, Ruzhong
    Luo, Junming
    Xiao, Jing
    Huang, Jin
    KNOWLEDGE-BASED SYSTEMS, 2024, 291
  • [47] BUILDING MARKOVIAN GENERATIVE ARCHITECTURES OVER PRETRAINED LM BACKBONES FOR EFFICIENT TASK-ORIENTED DIALOG SYSTEMS
    Liu, Hong
    Cai, Yucheng
    Ou, Zhijian
    Huang, Yi
    Feng, Junlan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 382 - 389
  • [48] Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI
    Kraft, Angelie
    Soulier, Eloise
    PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 1433 - 1445
  • [49] A Software Bug Fixing Approach Based on Knowledge-Enhanced Large Language Models
    Bo, Lili
    He, Yuting
    Sun, Xiaobing
    Ji, Wangjie
    Wu, Xiaohan
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 169 - 179
  • [50] KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model
    Geng, Lei
    Yan, Xu
    Cao, Ziqiang
    Li, Juntao
    Li, Wenjie
    Li, Sujian
    Zhou, Xinjie
    Yang, Yang
    Zhang, Jun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11239 - 11250