Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

被引:4
|
作者
Chen, Xiaolin [1 ]
Song, Xuemeng [2 ]
Jing, Liqiang [2 ]
Li, Shuo [2 ]
Hu, Linmei [3 ]
Nie, Liqiang [4 ]
机构
[1] Shandong Univ, Sch Software, Joint SDU NTU Ctr Artificial Intelligence Res, Jinan, Peoples R China
[2] Shandong Univ, Sch Comp Sci & Technol, Jinan, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[4] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal task-oriented dialog systems; text response generation; generative pretrained language model; dual knowledge selection;
D O I
10.1145/3606368
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text response generation for multimodal task-oriented dialog systems, which aims to generate the proper text response given the multimodal context, is an essential yet challenging task. Although existing efforts have achieved compelling success, they still suffer from two pivotal limitations: (1) overlook the benefit of generative pretraining and (2) ignore the textual context-related knowledge. To address these limitations, we propose a novel dual knowledge-enhanced generative pretrained language mode for multimodal task-oriented dialog systems (DKMD), consisting of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation. To be specific, the dual knowledge selection component aims to select the related knowledge according to both textual and visual modalities of the given context. Thereafter, the dual knowledge-enhanced context learning component targets seamlessly, integrating the selected knowledge into the multimodal context learning from both global and local perspectives, where the cross-modal semantic relation is also explored. Moreover, the knowledge-enhanced response generation component comprises a revised BART decoder, where an additional dot-product knowledge-decoder attention sub-layer is introduced for explicitly utilizing the knowledge to advance the text response generation. Extensive experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation
    Guan, Jian
    Huang, Fei
    Zhao, Zhihao
    Zhu, Xiaoyan
    Huang, Minlie
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 93 - 108
  • [32] A multi-agent knowledge-enhanced model for decision-supporting agroforestry systems
    Cavaliere, Danilo
    Senatore, Sabrina
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [33] Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization
    Jing, Liqiang
    Li, Yiren
    Xu, Junhao
    Yu, Yongcan
    Shen, Pei
    Song, Xuemeng
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 289 - 298
  • [34] A knowledge-enhanced transform-based multimodal classifier for microbial keratitis identification
    Jianfeng Wu
    Zhouhang Yuan
    Zhengqing Fang
    Zhengxing Huang
    Yesheng Xu
    Wenjia Xie
    Fei Wu
    Yu-Feng Yao
    Scientific Reports, 13
  • [35] A knowledge-enhanced transform-based multimodal classifier for microbial keratitis identification
    Wu, Jianfeng
    Yuan, Zhouhang
    Fang, Zhengqing
    Huang, Zhengxing
    Xu, Yesheng
    Xie, Wenjia
    Wu, Fei
    Yao, Yu-Feng
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [36] EHR-KnowGen: Knowledge-enhanced multimodal learning for disease diagnosis generation
    Niu, Shuai
    Ma, Jing
    Bai, Liang
    Wang, Zhihua
    Guo, Li
    Yang, Xian
    INFORMATION FUSION, 2024, 102
  • [37] Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)
    Fu, Ze
    Feng, Junhao
    Zheng, Changmeng
    Cai, Yi
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12947 - 12948
  • [38] Affective Knowledge-enhanced Emotion Detection in Arabic Language: A Comparative Study
    Serrano-Guerrero, Jesus
    Alshouha, Bashar
    Romero, Francisco P.
    Olivas, Jose A.
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2022, 28 (07) : 733 - 757
  • [39] A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction
    Hu, Zefa
    Ni, Ziyi
    Shi, Jing
    Xu, Shuang
    Xu, Bo
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (01) : 153 - 168
  • [40] A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction
    Zefa Hu
    Ziyi Ni
    Jing Shi
    Shuang Xu
    Bo Xu
    Machine Intelligence Research, 2024, 21 : 153 - 168