CISum: Learning Cross-modality Interaction to Enhance Multimodal Semantic Coverage for Multimodal Summarization

被引:0
|
作者
Zhang, Litian [1 ]
Zhang, Xiaoming [1 ]
Guo, Ziming [1 ]
Liu, Zhipeng [1 ]
机构
[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Summarization; Mulitmodal; Semantic coverage; Multi-task;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal summarization (MS) aims to generate a summary from multimodal input. Previous works mainly focus on textual semantic coverage metrics such as ROUGE, which considers the visual content as supplemental data. Therefore, the summary is ineffective to cover the semantics of different modalities. This paper proposes a multi-task cross-modality learning framework (CISum) to improve multimodal semantic coverage by learning the cross-modality interaction in the multimodal article. To obtain the visual semantics, we translate images into visual descriptions based on the correlation with text content. Then, the visual description and text content are fused to generate the textual summary to capture the semantics of the multimodal content, and the most relevant image is selected as the visual summary. Furthermore, we design an automatic multimodal semantics coverage metric to evaluate the performance. Experimental results show that CISum outperforms baselines in multimodal semantics coverage metrics while maintaining the excellent performance of ROUGE and BLEU.
引用
收藏
页码:370 / 378
页数:9
相关论文
共 50 条
  • [21] Cross-modality interaction for few-shot multispectral object detection with semantic knowledge
    Huang, Lian
    Peng, Zongju
    Chen, Fen
    Dai, Shaosheng
    He, Ziqiang
    Liu, Kesheng
    NEURAL NETWORKS, 2024, 173
  • [22] CROSS-MODALITY SEMANTIC INTEGRATION OF SENTENCE AND PICTURE MEMORY
    PEZDEK, K
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN LEARNING AND MEMORY, 1977, 3 (05): : 515 - 524
  • [23] Cross-modality interaction for few-shot multispectral object detection with semantic knowledge
    Huang, Lian
    Peng, Zongju
    Chen, Fen
    Dai, Shaosheng
    He, Ziqiang
    Liu, Kesheng
    Neural Networks, 2024, 173
  • [24] MCG-MNER: A Multi-Granularity Cross-Modality Generative Framework for Multimodal NER with Instruction
    Wu, Junjie
    Gong, Chen
    Cao, Ziqiang
    Fu, Guohong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3209 - 3218
  • [25] Semi-paired and semi-supervised multimodal hashing via cross-modality label propagation
    Di Wang
    Bin Shang
    Quan Wang
    Bo Wan
    Multimedia Tools and Applications, 2019, 78 : 24167 - 24185
  • [26] Semi-paired and semi-supervised multimodal hashing via cross-modality label propagation
    Wang, Di
    Shang, Bin
    Wang, Quan
    Wan, Bo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24167 - 24185
  • [27] Workshop on Multimodal Interfaces in Semantic Interaction
    Iwahashi, Naoto
    Nakano, Mikio
    ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, 2007, : 382 - 382
  • [28] Modality Preference in Multimodal Interaction for Elderly Persons
    Jian, Cui
    Shi, Hui
    Sasse, Nadine
    Rachuy, Carsten
    Schafmeister, Frank
    Schmidt, Holger
    von Steinbuechel, Nicole
    BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES (BIOSTEC 2013), 2014, 452 : 378 - 393
  • [29] Cross-Modality Learning by Exploring Modality Interactions for Emotion Reasoning
    Tran, Thi-Dung
    Ho, Ngoc-Huynh
    Pant, Sudarshan
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Gueesang
    IEEE ACCESS, 2023, 11 : 56634 - 56648
  • [30] Cross-modality collaborative learning identified pedestrian
    Wen, Xiongjun
    Feng, Xin
    Li, Ping
    Chen, Wenfang
    VISUAL COMPUTER, 2023, 39 (09): : 4117 - 4132