Multi-granularity semantic relational mapping for image caption

被引:0
|
作者
Gao, Nan [1 ]
Yao, Renyuan [1 ]
Chen, Peng [1 ]
Liang, Ronghua [1 ]
Sun, Guodao [1 ]
Tang, Jijun [2 ]
机构
[1] Zhejiang Univ Technol, Hangzhou 310014, Peoples R China
[2] Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Image caption; Multi-granularity; Dynamical semantic cue; Cross-attention; TRANSFORMER;
D O I
10.1016/j.eswa.2024.125847
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In terms of constructing object-relationship descriptions in images, existing image captioning methods incorporate regional semantic features into visual features to enhance the visual representation. However, they neglect the construction of grid semantic features, resulting in a lack of accurate detailed relationships in the generated results. We propose a M ulti-granularity S emantic R elational M apping(MSRM) framework that dynamically extracts image semantic cue features in place of traditional region labeling in order to get rid of the semantic capability limitation of fixed classification labels and construct grid semantic features. MSRM use the Internal Semantic Mapping mechanism to refine semantic features by filtering out irrelevant features and mapping them onto region and grid features. Simultaneously, the Semantic Mapping mechanism is used to integrate the composite features derived from regions and grids, thereby addressing the problem of describing semantic relationships among objects across different granularities. Experiments on the MSCOCO and Flickr30k datasets show that the proposed MSRM significantly outperforms the state-of-the-art baselines by more than 4% in 7 different metrics including BLEUs, Meteor, Rouge and CIDEr.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Image Retrieval Using Multi-Granularity Features of Color and Texture
    Xu, Xiangli
    Zhang, Libiao
    Liu, Xiangdong
    Yu, Zhezhou
    Zhou, Chunguang
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 54 - 58
  • [32] Multi-Granularity Relational Attention Network for Audio-Visual Question Answering
    Li, Linjun
    Jin, Tao
    Lin, Wang
    Jiang, Hao
    Pan, Wenwen
    Wang, Jian
    Xiao, Shuwen
    Xia, Yan
    Jiang, Weihao
    Zhao, Zhou
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7080 - 7094
  • [33] Multi-granularity Fatigue in Recommendation
    Xie, Ruobing
    Ling, Cheng
    Zhang, Shaoliang
    Xia, Feng
    Lin, Leyu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4595 - 4599
  • [34] Short text matching model with multiway semantic interaction based on multi-granularity semantic embedding
    Xianlun Tang
    Yang Luo
    Deyi Xiong
    Jingming Yang
    Rui Li
    Deguang Peng
    Applied Intelligence, 2022, 52 : 15632 - 15642
  • [35] Multi-granularity Attribute Reduction
    Liang, Shaochen
    Liu, Keyu
    Chen, Xiangjian
    Wang, Pingxin
    Yang, Xibei
    ROUGH SETS, IJCRS 2018, 2018, 11103 : 61 - 72
  • [36] Multi-granularity for knowledge distillation
    Shao, Baitan
    Chen, Ying
    IMAGE AND VISION COMPUTING, 2021, 115 (115)
  • [37] Multi-granularity resource Reservations
    Saewong, S
    Rajkumar, R
    RTSS 2005: 26th IEEE International Real-Time Systems Symposium, Proceedings, 2005, : 143 - 153
  • [38] Short text matching model with multiway semantic interaction based on multi-granularity semantic embedding
    Tang, Xianlun
    Luo, Yang
    Xiong, Deyi
    Yang, Jingming
    Li, Rui
    Peng, Deguang
    APPLIED INTELLIGENCE, 2022, 52 (13) : 15632 - 15642
  • [39] Multi-Granularity Representations of Dialog
    Mehri, Shikib
    Eskenazi, Maxine
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1752 - 1761
  • [40] A multi-granularity genetic algorithm
    Li, Caoxiao
    Xia, Shuyin
    Chen, Zizhong
    Wang, Guoyin
    2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 135 - 141