Multi-granularity semantic relational mapping for image caption

被引:0
|
作者
Gao, Nan [1 ]
Yao, Renyuan [1 ]
Chen, Peng [1 ]
Liang, Ronghua [1 ]
Sun, Guodao [1 ]
Tang, Jijun [2 ]
机构
[1] Zhejiang Univ Technol, Hangzhou 310014, Peoples R China
[2] Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Image caption; Multi-granularity; Dynamical semantic cue; Cross-attention; TRANSFORMER;
D O I
10.1016/j.eswa.2024.125847
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In terms of constructing object-relationship descriptions in images, existing image captioning methods incorporate regional semantic features into visual features to enhance the visual representation. However, they neglect the construction of grid semantic features, resulting in a lack of accurate detailed relationships in the generated results. We propose a M ulti-granularity S emantic R elational M apping(MSRM) framework that dynamically extracts image semantic cue features in place of traditional region labeling in order to get rid of the semantic capability limitation of fixed classification labels and construct grid semantic features. MSRM use the Internal Semantic Mapping mechanism to refine semantic features by filtering out irrelevant features and mapping them onto region and grid features. Simultaneously, the Semantic Mapping mechanism is used to integrate the composite features derived from regions and grids, thereby addressing the problem of describing semantic relationships among objects across different granularities. Experiments on the MSCOCO and Flickr30k datasets show that the proposed MSRM significantly outperforms the state-of-the-art baselines by more than 4% in 7 different metrics including BLEUs, Meteor, Rouge and CIDEr.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Multi-granularity semantic alignment distillation learning for remote sensing image semantic segmentation
    Zhang, Di
    Zhou, Yong
    Zhao, Jiaqi
    Yang, Zhongyuan
    Dong, Hui
    Yao, Rui
    Ma, Huifang
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (04)
  • [2] Multi-granularity semantic alignment distillation learning for remote sensing image semantic segmentation
    Di Zhang
    Yong Zhou
    Jiaqi Zhao
    Zhongyuan Yang
    Hui Dong
    Rui Yao
    Huifang Ma
    Frontiers of Computer Science, 2022, 16
  • [3] Multi-granularity semantic alignment distillation learning for remote sensing image semantic segmentation
    ZHANG Di
    ZHOU Yong
    ZHAO Jiaqi
    YANG Zhongyuan
    DONG Hui
    YAO Rui
    MA Huifang
    Frontiers of Computer Science, 2022, 16 (04)
  • [4] Multi-granularity vision transformer via semantic token for hyperspectral image classification
    Li, Bin
    Ouyang, Er
    Hu, Wenjing
    Zhang, Guoyun
    Zhao, Lin
    Wu, Jianhui
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2022, 43 (17) : 6538 - 6560
  • [5] Multi-granularity semantic representation model for relation extraction
    Lei, Ming
    Huang, Heyan
    Feng, Chong
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (12): : 6879 - 6889
  • [6] A Multi-Granularity Semantic Extraction Method for Text Classification
    Li, Min
    Liu, Zeyu
    Li, Gang
    Han, Delong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XIII, ICIC 2024, 2024, 14874 : 224 - 236
  • [7] Towards semantic comparison of multi-granularity process traces
    Liu, Qing
    Zhao, Xiang
    Taylor, Kerry
    Lin, Xuemin
    Squire, Geoffrey
    Kloppers, Corne
    Miller, Richard
    KNOWLEDGE-BASED SYSTEMS, 2013, 52 : 91 - 106
  • [8] Hierarchical Multi-Granularity Joint Source-Channel Coding for Image Semantic Transmission
    Sun, Xiaochuan
    Yu, Jike
    Wu, Changcheng
    Li, Yingqi
    Zhang, Haijun
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (12) : 3325 - 3329
  • [9] Multi-granularity semantic representation model for relation extraction
    Ming Lei
    Heyan Huang
    Chong Feng
    Neural Computing and Applications, 2021, 33 : 6879 - 6889
  • [10] Multi-Granularity Context Network for Efficient Video Semantic Segmentation
    Liang, Zhiyuan
    Dai, Xiangdong
    Wu, Yiqian
    Jin, Xiaogang
    Shen, Jianbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3163 - 3175