Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

被引:4
|
作者
Chen, Xiaolin [1 ]
Song, Xuemeng [2 ]
Jing, Liqiang [2 ]
Li, Shuo [2 ]
Hu, Linmei [3 ]
Nie, Liqiang [4 ]
机构
[1] Shandong Univ, Sch Software, Joint SDU NTU Ctr Artificial Intelligence Res, Jinan, Peoples R China
[2] Shandong Univ, Sch Comp Sci & Technol, Jinan, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[4] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal task-oriented dialog systems; text response generation; generative pretrained language model; dual knowledge selection;
D O I
10.1145/3606368
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: (1) overlook the benefit of generative pretraining and (2) ignore the textual context-related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language mode for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly, integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] A knowledge-enhanced directed graph isomorphism network for multimodal sarcasm detection
    Liu, Yu
    Zeng, Ziming
    ELECTRONIC LIBRARY, 2025,
  • [22] DKEN: Deep knowledge-enhanced network for recommender systems
    Guo, Xiaobo
    Lin, Wenfang
    Li, Youru
    Liu, Zhongyi
    Yang, Lin
    Zhao, Shuliang
    Zhu, Zhenfeng
    INFORMATION SCIENCES, 2020, 540 : 263 - 277
  • [23] Metadata Shaping: A Simple Approach for Knowledge-Enhanced Language Models
    Arora, Simran
    Wu, Sen
    Liu, Enci
    Re, Christopher
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1733 - 1745
  • [24] Knowledge-Enhanced Visual-Language Pretraining for Computational Pathology
    Zhou, Xiao
    Zhang, Xiaoman
    Wu, Chaoyi
    Zhang, Ya
    Xie, Weidi
    Wang, Yanfeng
    COMPUTER VISION - ECCV 2024, PT LII, 2025, 15110 : 345 - 362
  • [25] Knowledge-enhanced generative adversarial networks for schematic design of framed tube structures
    Fei, Yifan
    Liao, Wenjie
    Huang, Yuli
    Lu, Xinzheng
    AUTOMATION IN CONSTRUCTION, 2022, 144
  • [26] Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding
    Zhang, Taolin
    Xu, Ruyao
    Wang, Chengyu
    Duan, Zhongjie
    Chen, Cen
    Qiu, Minghui
    Cheng, Dawei
    He, Xiaofeng
    Qian, Weining
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 15663 - 15676
  • [27] Construction of Legal Knowledge Graph Based on Knowledge-Enhanced Large Language Models
    Li, Jun
    Qian, Lu
    Liu, Peifeng
    Liu, Taoxiong
    INFORMATION, 2024, 15 (11)
  • [28] Information Retrieval Based on Knowledge-Enhanced Word Embedding Through Dialog: A Case Study
    Ren, Jin
    Wang, Hengsheng
    Liu, Tong
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 275 - 290
  • [29] Information Retrieval Based on Knowledge-Enhanced Word Embedding Through Dialog: A Case Study
    Jin Ren
    Hengsheng Wang
    Tong Liu
    International Journal of Computational Intelligence Systems, 2020, 13 : 275 - 290
  • [30] Towards Generative Modeling of Urban Flow through Knowledge-enhanced Denoising Diffusion
    Zhou, Zhilun
    Ding, Jingtao
    Liu, Yu
    Jin, Depeng
    Li, Yong
    31ST ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, ACM SIGSPATIAL GIS 2023, 2023, : 534 - 545