Multi-granularity semantic relational mapping for image caption

被引：0

作者：

Gao, Nan ^{[1
]}

Yao, Renyuan ^{[1
]}

Chen, Peng ^{[1
]}

Liang, Ronghua ^{[1
]}

Sun, Guodao ^{[1
]}

Tang, Jijun ^{[2
]}

机构：

[1] Zhejiang Univ Technol, Hangzhou 310014, Peoples R China

[2] Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2025年 / 264卷

基金：

中国国家自然科学基金;

关键词：

Image caption; Multi-granularity; Dynamical semantic cue; Cross-attention; TRANSFORMER;

D O I：

10.1016/j.eswa.2024.125847

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In terms of constructing object-relationship descriptions in images, existing image captioning methods incorporate regional semantic features into visual features to enhance the visual representation. However, they neglect the construction of grid semantic features, resulting in a lack of accurate detailed relationships in the generated results. We propose a M ulti-granularity S emantic R elational M apping(MSRM) framework that dynamically extracts image semantic cue features in place of traditional region labeling in order to get rid of the semantic capability limitation of fixed classification labels and construct grid semantic features. MSRM use the Internal Semantic Mapping mechanism to refine semantic features by filtering out irrelevant features and mapping them onto region and grid features. Simultaneously, the Semantic Mapping mechanism is used to integrate the composite features derived from regions and grids, thereby addressing the problem of describing semantic relationships among objects across different granularities. Experiments on the MSCOCO and Flickr30k datasets show that the proposed MSRM significantly outperforms the state-of-the-art baselines by more than 4% in 7 different metrics including BLEUs, Meteor, Rouge and CIDEr.

引用

页数：12

共 50 条

[31] Image Retrieval Using Multi-Granularity Features of Color and Texture
Xu, Xiangli
Zhang, Libiao
Liu, Xiangdong
Yu, Zhezhou
Zhou, Chunguang
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 54 - 58
[32] Multi-Granularity Relational Attention Network for Audio-Visual Question Answering
Li, Linjun
Jin, Tao
Lin, Wang
Jiang, Hao
Pan, Wenwen
Wang, Jian
Xiao, Shuwen
Xia, Yan
Jiang, Weihao
Zhao, Zhou
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7080 - 7094
[33] Multi-granularity Fatigue in Recommendation
Xie, Ruobing
Ling, Cheng
Zhang, Shaoliang
Xia, Feng
Lin, Leyu
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4595 - 4599
[34] Short text matching model with multiway semantic interaction based on multi-granularity semantic embedding
Xianlun Tang
Yang Luo
Deyi Xiong
Jingming Yang
Rui Li
Deguang Peng
Applied Intelligence, 2022, 52 : 15632 - 15642
[35] Multi-granularity Attribute Reduction
Liang, Shaochen
Liu, Keyu
Chen, Xiangjian
Wang, Pingxin
Yang, Xibei
ROUGH SETS, IJCRS 2018, 2018, 11103 : 61 - 72
[36] Multi-granularity for knowledge distillation
Shao, Baitan
Chen, Ying
IMAGE AND VISION COMPUTING, 2021, 115 (115)
[37] Multi-granularity resource Reservations
Saewong, S
Rajkumar, R
RTSS 2005: 26th IEEE International Real-Time Systems Symposium, Proceedings, 2005, : 143 - 153
[38] Short text matching model with multiway semantic interaction based on multi-granularity semantic embedding
Tang, Xianlun
Luo, Yang
Xiong, Deyi
Yang, Jingming
Li, Rui
Peng, Deguang
APPLIED INTELLIGENCE, 2022, 52 (13) : 15632 - 15642
[39] Multi-Granularity Representations of Dialog
Mehri, Shikib
Eskenazi, Maxine
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1752 - 1761
[40] A multi-granularity genetic algorithm
Li, Caoxiao
Xia, Shuyin
Chen, Zizhong
Wang, Guoyin
2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 135 - 141

← 1 2 3 4 5 →