Multi-granularity semantic relational mapping for image caption

被引：0

作者：

Gao, Nan ^{[1
]}

Yao, Renyuan ^{[1
]}

Chen, Peng ^{[1
]}

Liang, Ronghua ^{[1
]}

Sun, Guodao ^{[1
]}

Tang, Jijun ^{[2
]}

机构：

[1] Zhejiang Univ Technol, Hangzhou 310014, Peoples R China

[2] Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2025年 / 264卷

基金：

中国国家自然科学基金;

关键词：

Image caption; Multi-granularity; Dynamical semantic cue; Cross-attention; TRANSFORMER;

D O I：

10.1016/j.eswa.2024.125847

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In terms of constructing object-relationship descriptions in images, existing image captioning methods incorporate regional semantic features into visual features to enhance the visual representation. However, they neglect the construction of grid semantic features, resulting in a lack of accurate detailed relationships in the generated results. We propose a M ulti-granularity S emantic R elational M apping(MSRM) framework that dynamically extracts image semantic cue features in place of traditional region labeling in order to get rid of the semantic capability limitation of fixed classification labels and construct grid semantic features. MSRM use the Internal Semantic Mapping mechanism to refine semantic features by filtering out irrelevant features and mapping them onto region and grid features. Simultaneously, the Semantic Mapping mechanism is used to integrate the composite features derived from regions and grids, thereby addressing the problem of describing semantic relationships among objects across different granularities. Experiments on the MSCOCO and Flickr30k datasets show that the proposed MSRM significantly outperforms the state-of-the-art baselines by more than 4% in 7 different metrics including BLEUs, Meteor, Rouge and CIDEr.

引用

页数：12

共 50 条

[1] Multi-granularity semantic alignment distillation learning for remote sensing image semantic segmentation
Zhang, Di
Zhou, Yong
Zhao, Jiaqi
Yang, Zhongyuan
Dong, Hui
Yao, Rui
Ma, Huifang
FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (04)
[2] Multi-granularity semantic alignment distillation learning for remote sensing image semantic segmentation
Di Zhang
Yong Zhou
Jiaqi Zhao
Zhongyuan Yang
Hui Dong
Rui Yao
Huifang Ma
Frontiers of Computer Science, 2022, 16
[3] Multi-granularity semantic alignment distillation learning for remote sensing image semantic segmentation
ZHANG Di
ZHOU Yong
ZHAO Jiaqi
YANG Zhongyuan
DONG Hui
YAO Rui
MA Huifang
Frontiers of Computer Science, 2022, 16 (04)
[4] Multi-granularity vision transformer via semantic token for hyperspectral image classification
Li, Bin
Ouyang, Er
Hu, Wenjing
Zhang, Guoyun
Zhao, Lin
Wu, Jianhui
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2022, 43 (17) : 6538 - 6560
[5] Multi-granularity semantic representation model for relation extraction
Lei, Ming
Huang, Heyan
Feng, Chong
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (12): : 6879 - 6889
[6] A Multi-Granularity Semantic Extraction Method for Text Classification
Li, Min
Liu, Zeyu
Li, Gang
Han, Delong
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 224 - 236
[7] Towards semantic comparison of multi-granularity process traces
Liu, Qing
Zhao, Xiang
Taylor, Kerry
Lin, Xuemin
Squire, Geoffrey
Kloppers, Corne
Miller, Richard
KNOWLEDGE-BASED SYSTEMS, 2013, 52 : 91 - 106
[8] Hierarchical Multi-Granularity Joint Source-Channel Coding for Image Semantic Transmission
Sun, Xiaochuan
Yu, Jike
Wu, Changcheng
Li, Yingqi
Zhang, Haijun
IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (12) : 3325 - 3329
[9] Multi-granularity semantic representation model for relation extraction
Ming Lei
Heyan Huang
Chong Feng
Neural Computing and Applications, 2021, 33 : 6879 - 6889
[10] Multi-Granularity Context Network for Efficient Video Semantic Segmentation
Liang, Zhiyuan
Dai, Xiangdong
Wu, Yiqian
Jin, Xiaogang
Shen, Jianbing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3163 - 3175

← 1 2 3 4 5 →