Multi-granularity semantic relational mapping for image caption

被引:0
|
作者
Gao, Nan [1 ]
Yao, Renyuan [1 ]
Chen, Peng [1 ]
Liang, Ronghua [1 ]
Sun, Guodao [1 ]
Tang, Jijun [2 ]
机构
[1] Zhejiang Univ Technol, Hangzhou 310014, Peoples R China
[2] Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Image caption; Multi-granularity; Dynamical semantic cue; Cross-attention; TRANSFORMER;
D O I
10.1016/j.eswa.2024.125847
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In terms of constructing object-relationship descriptions in images, existing image captioning methods incorporate regional semantic features into visual features to enhance the visual representation. However, they neglect the construction of grid semantic features, resulting in a lack of accurate detailed relationships in the generated results. We propose a M ulti-granularity S emantic R elational M apping(MSRM) framework that dynamically extracts image semantic cue features in place of traditional region labeling in order to get rid of the semantic capability limitation of fixed classification labels and construct grid semantic features. MSRM use the Internal Semantic Mapping mechanism to refine semantic features by filtering out irrelevant features and mapping them onto region and grid features. Simultaneously, the Semantic Mapping mechanism is used to integrate the composite features derived from regions and grids, thereby addressing the problem of describing semantic relationships among objects across different granularities. Experiments on the MSCOCO and Flickr30k datasets show that the proposed MSRM significantly outperforms the state-of-the-art baselines by more than 4% in 7 different metrics including BLEUs, Meteor, Rouge and CIDEr.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Adaptive Multi-granularity Aggregation Transformer for Image Captioning
    Li, Daitianxia
    Wang, Ye
    Liu, Qun
    ROUGH SETS, IJCRS 2023, 2023, 14481 : 339 - 353
  • [22] Learning knowledge graph embedding with multi-granularity relational augmentation network
    Xue, Zengcan
    Zhang, Zhaoli
    Liu, Hai
    Yang, Shuoqiu
    Han, Shuyun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 233
  • [23] GMF-GAN: Gradual multi-granularity semantic fusion GAN for text-to-image synthesis
    Jin, Dehu
    Li, Guangju
    Yu, Qi
    Yu, Lan
    Cui, Jia
    Qi, Meng
    DIGITAL SIGNAL PROCESSING, 2023, 140
  • [24] Multi-weight and multi-granularity fusion of underwater image enhancement
    Shuqi Wang
    Zhixiang Chen
    Hui Wang
    Earth Science Informatics, 2022, 15 : 1647 - 1657
  • [25] Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation
    Chen, Tao
    Yao, Yazhou
    Tang, Jinhui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2960 - 2971
  • [26] A multi-granularity semisupervised active learning for point cloud semantic segmentation
    Shanding Ye
    Zhe Yin
    Yongjian Fu
    Hu Lin
    Zhijie Pan
    Neural Computing and Applications, 2023, 35 : 15629 - 15645
  • [27] Multi-weight and multi-granularity fusion of underwater image enhancement
    Wang, Shuqi
    Chen, Zhixiang
    Wang, Hui
    EARTH SCIENCE INFORMATICS, 2022, 15 (03) : 1647 - 1657
  • [28] Chinese sentiment analysis model by integrating multi-granularity semantic features
    Liu, Zhongbao
    Zhao, Wenjuan
    DATA TECHNOLOGIES AND APPLICATIONS, 2023, 57 (04) : 605 - 622
  • [29] Chinese Sentence Semantic Matching Based on Multi-Granularity Fusion Model
    Zhang, Xu
    Lu, Wenpeng
    Zhang, Guoqiang
    Li, Fangfang
    Wang, Shoujin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 246 - 257
  • [30] A multi-granularity semisupervised active learning for point cloud semantic segmentation
    Ye, Shanding
    Yin, Zhe
    Fu, Yongjian
    Lin, Hu
    Pan, Zhijie
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (21): : 15629 - 15645