Multi-granularity semantic relational mapping for image caption

被引:0
|
作者
Gao, Nan [1 ]
Yao, Renyuan [1 ]
Chen, Peng [1 ]
Liang, Ronghua [1 ]
Sun, Guodao [1 ]
Tang, Jijun [2 ]
机构
[1] Zhejiang Univ Technol, Hangzhou 310014, Peoples R China
[2] Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Image caption; Multi-granularity; Dynamical semantic cue; Cross-attention; TRANSFORMER;
D O I
10.1016/j.eswa.2024.125847
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In terms of constructing object-relationship descriptions in images, existing image captioning methods incorporate regional semantic features into visual features to enhance the visual representation. However, they neglect the construction of grid semantic features, resulting in a lack of accurate detailed relationships in the generated results. We propose a M ulti-granularity S emantic R elational M apping(MSRM) framework that dynamically extracts image semantic cue features in place of traditional region labeling in order to get rid of the semantic capability limitation of fixed classification labels and construct grid semantic features. MSRM use the Internal Semantic Mapping mechanism to refine semantic features by filtering out irrelevant features and mapping them onto region and grid features. Simultaneously, the Semantic Mapping mechanism is used to integrate the composite features derived from regions and grids, thereby addressing the problem of describing semantic relationships among objects across different granularities. Experiments on the MSCOCO and Flickr30k datasets show that the proposed MSRM significantly outperforms the state-of-the-art baselines by more than 4% in 7 different metrics including BLEUs, Meteor, Rouge and CIDEr.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Multi-granularity interaction model based on pinyins and radicals for Chinese semantic matching
    Zhao, Pengyu
    Lu, Wenpeng
    Wang, Shoujin
    Peng, Xueping
    Jian, Ping
    Wu, Hao
    Zhang, Weiyu
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (04): : 1703 - 1723
  • [42] Multi-granularity interaction model based on pinyins and radicals for Chinese semantic matching
    Pengyu Zhao
    Wenpeng Lu
    Shoujin Wang
    Xueping Peng
    Ping Jian
    Hao Wu
    Weiyu Zhang
    World Wide Web, 2022, 25 : 1703 - 1723
  • [43] Transferable dual multi-granularity semantic excavating for partially relevant video retrieval
    Cheng, Dingxin
    Kong, Shuhan
    Jiang, Bin
    Guo, Qiang
    IMAGE AND VISION COMPUTING, 2024, 149
  • [44] Learning multi-granularity features from multi-granularity regions for person re-identification
    Yang, Kaiwen
    Yang, Jiwei
    Tian, Xinmei
    NEUROCOMPUTING, 2021, 432 : 206 - 215
  • [45] Semantic-fused multi-granularity cross-city traffic prediction
    Chen, Kehua
    Liang, Yuxuan
    Han, Jindong
    Feng, Siyuan
    Zhu, Meixin
    Yang, Hai
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2024, 162
  • [46] Multi-granularity generative adversarial nets with reconstructive sampling for image inpainting
    Xu, Liming
    Zeng, Xianhua
    Li, Weisheng
    Huang, Zhiwei
    NEUROCOMPUTING, 2020, 402 : 220 - 234
  • [47] Combining Multi-granularity Text Semantics with Graph Relational Semantics for Question Retrieval in CQA
    Li, Hong
    Li, Jianjun
    Jin, Huazhong
    Chen, Zixuan
    Zou, Wei
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT II, ICIC 2024, 2024, 14876 : 53 - 64
  • [48] Edge consistent image completion based on multi-granularity feature fusion
    Zhang S.-Y.
    Wang G.-Y.
    Liu Q.
    Wang R.-Q.
    Kongzhi yu Juece/Control and Decision, 2022, 37 (12): : 3240 - 3250
  • [49] Image classification based on multi-granularity convolutional Neural network model
    Wu, Xiaogang
    Tanprasert, Thitipong
    Jing, Wang
    2022 19TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2022), 2022,
  • [50] Efficient multi-granularity network for fine-grained image classification
    Jiabao Wang
    Yang Li
    Hang Li
    Xun Zhao
    Rui Zhang
    Zhuang Miao
    Journal of Real-Time Image Processing, 2022, 19 : 853 - 866