CISum: Learning Cross-modality Interaction to Enhance Multimodal Semantic Coverage for Multimodal Summarization

被引：0

作者：

Zhang, Litian ^{[1
]}

Zhang, Xiaoming ^{[1
]}

Guo, Ziming ^{[1
]}

Liu, Zhipeng ^{[1
]}

机构：

[1] Beihang Univ, Sch Cyber Sci & Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM | 2023年

基金：

中国国家自然科学基金;

关键词：

Summarization; Mulitmodal; Semantic coverage; Multi-task;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal summarization (MS) aims to generate a summary from multimodal input. Previous works mainly focus on textual semantic coverage metrics such as ROUGE, which considers the visual content as supplemental data. Therefore, the summary is ineffective to cover the semantics of different modalities. This paper proposes a multi-task cross-modality learning framework (CISum) to improve multimodal semantic coverage by learning the cross-modality interaction in the multimodal article. To obtain the visual semantics, we translate images into visual descriptions based on the correlation with text content. Then, the visual description and text content are fused to generate the textual summary to capture the semantics of the multimodal content, and the most relevant image is selected as the visual summary. Furthermore, we design an automatic multimodal semantics coverage metric to evaluate the performance. Experimental results show that CISum outperforms baselines in multimodal semantics coverage metrics while maintaining the excellent performance of ROUGE and BLEU.

引用

页码：370 / 378

页数：9

共 50 条

[21] Cross-modality interaction for few-shot multispectral object detection with semantic knowledge
Huang, Lian
Peng, Zongju
Chen, Fen
Dai, Shaosheng
He, Ziqiang
Liu, Kesheng
NEURAL NETWORKS, 2024, 173
[22] CROSS-MODALITY SEMANTIC INTEGRATION OF SENTENCE AND PICTURE MEMORY
PEZDEK, K
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN LEARNING AND MEMORY, 1977, 3 (05): : 515 - 524
[23] Cross-modality interaction for few-shot multispectral object detection with semantic knowledge
Huang, Lian
Peng, Zongju
Chen, Fen
Dai, Shaosheng
He, Ziqiang
Liu, Kesheng
Neural Networks, 2024, 173
[24] MCG-MNER: A Multi-Granularity Cross-Modality Generative Framework for Multimodal NER with Instruction
Wu, Junjie
Gong, Chen
Cao, Ziqiang
Fu, Guohong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3209 - 3218
[25] Semi-paired and semi-supervised multimodal hashing via cross-modality label propagation
Di Wang
Bin Shang
Quan Wang
Bo Wan
Multimedia Tools and Applications, 2019, 78 : 24167 - 24185
[26] Semi-paired and semi-supervised multimodal hashing via cross-modality label propagation
Wang, Di
Shang, Bin
Wang, Quan
Wan, Bo
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24167 - 24185
[27] Workshop on Multimodal Interfaces in Semantic Interaction
Iwahashi, Naoto
Nakano, Mikio
ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, 2007, : 382 - 382
[28] Modality Preference in Multimodal Interaction for Elderly Persons
Jian, Cui
Shi, Hui
Sasse, Nadine
Rachuy, Carsten
Schafmeister, Frank
Schmidt, Holger
von Steinbuechel, Nicole
BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES (BIOSTEC 2013), 2014, 452 : 378 - 393
[29] Cross-Modality Learning by Exploring Modality Interactions for Emotion Reasoning
Tran, Thi-Dung
Ho, Ngoc-Huynh
Pant, Sudarshan
Yang, Hyung-Jeong
Kim, Soo-Hyung
Lee, Gueesang
IEEE ACCESS, 2023, 11 : 56634 - 56648
[30] Cross-modality collaborative learning identified pedestrian
Wen, Xiongjun
Feng, Xin
Li, Ping
Chen, Wenfang
VISUAL COMPUTER, 2023, 39 (09): : 4117 - 4132

← 1 2 3 4 5 →