MIXED KNOWLEDGE RELATION TRANSFORMER FOR IMAGE CAPTIONING

被引：0

作者：

Chen, Tianyu ^{[1
]}

Li, Zhixin ^{[1
]}

Wei, Jiahui ^{[1
]}

Xian, Tiantao ^{[1
]}

机构：

[1] Guangxi Normal Univ, Guangxi Key Lab Multisource Informat Min & Secur, Guilin 541004, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

基金：

中国国家自然科学基金;

关键词：

image captioning; external knowledge; object relation; LANGUAGE;

D O I：

10.1109/ICASSP43922.2022.9747541

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Internal relationship of image objects has contributed significantly to the development of image captioning, especially when combined with Transformer architecture. Most of these methods only calculate the relationship between entities and ignore the information between entities and background. Besides, the way of exploring the relational information inside the image can also be extended. In this paper, we continually explore the relationship between objects from both internal and external perspectives, and embed the vital image global information into the internal relationship module. To validate the effectiveness of our model, we conduct extensive experiments on the most popular MSCOCO dataset, and achieve state-of-the-art performance on both online and offline test sets.

引用

下载

页码：4403 / 4407

页数：5

共 50 条

[21] Relational-Convergent Transformer for image captioning
Chen, Lizhi
Yang, You
Hu, Juntao
Pan, Longyue
Zhai, Hao
DISPLAYS, 2023, 77
[22] Context-aware transformer for image captioning
Yang, Xin
Wang, Ying
Chen, Haishun
Li, Jie
Huang, Tingting
NEUROCOMPUTING, 2023, 549
[23] A Position-Aware Transformer for Image Captioning
Deng, Zelin
Zhou, Bo
He, Pei
Huang, Jianfeng
Alfarraj, Osama
Tolba, Amr
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (01): : 2065 - 2081
[24] Full-Memory Transformer for Image Captioning
Lu, Tongwei
Wang, Jiarong
Min, Fen
SYMMETRY-BASEL, 2023, 15 (01):
[25] Retrieval-Augmented Transformer for Image Captioning
Sarto, Sara
Cornia, Marcella
Baraldi, Lorenzo
Cucchiara, Rita
19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 1 - 7
[26] A position-aware transformer for image captioning
Deng, Zelin
Zhou, Bo
He, Pei
Huang, Jianfeng
Alfarraj, Osama
Tolba, Amr
Deng, Zelin (zl_deng@sina.com), 2005, Tech Science Press (70): : 2005 - 2021
[27] Dual Position Relationship Transformer for Image Captioning
Wang, Yaohan
Qian, Wenhua
Nie, Rencan
Xu, Dan
Cao, Jinde
Kim, Pyoungwon
BIG DATA, 2022, 10 (06) : 515 - 527
[28] SPT: Spatial Pyramid Transformer for Image Captioning
Zhang, Haonan
Zeng, Pengpeng
Gao, Lianli
Lyu, Xinyu
Song, Jingkuan
Shen, Heng Tao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4829 - 4842
[29] Position-guided transformer for image captioning
Hu, Juntao
Yang, You
Yao, Lu
An, Yongzhi
Pan, Longyue
IMAGE AND VISION COMPUTING, 2022, 128
[30] Dual Global Enhanced Transformer for image captioning
Xian, Tiantao
Li, Zhixin
Zhang, Canlong
Ma, Huifang
NEURAL NETWORKS, 2022, 148 : 129 - 141

← 1 2 3 4 5 →