Image Caption via Visual Attention Switch on DenseNet

被引：0

作者：

Hao, Yanlong ^{[1
]}

Xie, Jiyang ^{[1
]}

Lin, Zhiqing ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Pattern Recognit & Intelligent Syst Lab, Beijing 100876, Peoples R China

来源：

PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC) | 2018年

关键词：

Image caption; Visual attention switch; Encoder-decoder architecture; DenseNet; LSTM;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We introduce a novel approach that is used to convert images into the corresponding language descriptions. This method follows the most popular encoder-decoder architecture. The encoder uses the recently proposed densely convolutional neural network (DenseNet) to extract the feature maps. Meanwhile, the decoder uses the long short time memory (LSTM) to parse the feature maps to descriptions. We predict the next word of descriptions by taking the effective combination of feature maps with word embedding of current input word by "visual attention switch". Finally, we compare the performance of the proposed model with other baseline models and achieve good results.

引用

页码：334 / 338

页数：5

共 50 条

[1] Chinese Image Caption Generation via Visual Attention and Topic Modeling
Liu, Maofu
Hu, Huijun
Li, Lingjun
Yu, Yan
Guan, Weili
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
[2] Image caption based on Visual Attention Mechanism
Zhou, Jinfei
Zhu, Yaping
Pan, Hong
[J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 28 - 32
[3] Image Caption Generation with Hierarchical Contextual Visual Spatial Attention
Khademi, Mahmoud
Schulte, Oliver
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2024 - 2032
[4] Clothes image caption generation with attribute detection and visual attention model
Li, Xianrui
Ye, Zhiling
Zhang, Zhao
Zhao, Mingbo
[J]. PATTERN RECOGNITION LETTERS, 2021, 141 : 68 - 74
[5] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Xu, Kelvin
Ba, Jimmy Lei
Kiros, Ryan
Cho, Kyunghyun
Courville, Aaron
Salakhutdinov, Ruslan
Zemel, Richard S.
Bengio, Yoshua
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2048 - 2057
[6] GVA: guided visual attention approach for automatic image caption generation
Hossen, Md. Bipul
Ye, Zhongfu
Abdussalam, Amr
Hossain, Md. Imran
[J]. MULTIMEDIA SYSTEMS, 2024, 30 (01)
[7] GVA: guided visual attention approach for automatic image caption generation
Md. Bipul Hossen
Zhongfu Ye
Amr Abdussalam
Md. Imran Hossain
[J]. Multimedia Systems, 2024, 30
[8] Cross-Lingual Image Caption Generation Based on Visual Attention Model
Wang, Bin
Wang, Cungang
Zhang, Qian
Su, Ying
Wang, Yang
Xu, Yanyan
[J]. IEEE ACCESS, 2020, 8 : 104543 - 104554
[9] Image Caption with Endogenous–Exogenous Attention
Teng Wang
Haifeng Hu
Chen He
[J]. Neural Processing Letters, 2019, 50 : 431 - 443
[10] Visual Relational Reasoning for Image Caption
Pei, Haolei
Chen, Qiaohong
Wang, Ji
Sun, Qi
Jia, Yubo
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →