Parallel encoder-decoder framework for image captioning

被引：3

作者：

Saeidimesineh, Reyhane ^{[1
]}

Adibi, Peyman ^{[1
]}

Karshenas, Hossein ^{[1
]}

Darvishy, Alireza ^{[2
]}

机构：

[1] Univ Isfahan, Fac Comp Engn, Artificial Intelligence Dept, Esfahan, Iran

[2] Zurich Univ Appl Sci ZHAW, Sch Engn, Zurich, Switzerland

来源：

KNOWLEDGE-BASED SYSTEMS | 2023年 / 282卷

关键词：

Parallelization; Encoder-decoder framework; Image captioning; Natural language processing; REPRESENTATION; TRANSFORMER;

D O I：

10.1016/j.knosys.2023.111056

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress in deep learning has led to successful utilization of encoder-decoder frameworks inspired by machine translation in image captioning models. The stacking of layers in encoders and decoders has made it possible to use several modules in encoders and decoders. However, just one type of module in encoder or decoder has been used in stacked models. In this research, we propose a parallel encoder-decoder framework that aims to take advantage of multiple of types modules in encoders and decoders, simultaneously. This framework contains augmented parallel blocks, which include stacking modules or non-stacked ones. Then, the results of the blocks are integrated to extract higher-level semantic concepts. This general idea is not limited to image captioning and can be customized for many applications that utilize encoder-decoder frameworks. We evaluated our proposed method on the MS-COCO dataset and achieved state-of-the-art results. We got 149.92 for CIDEr-D metric outperforming state-of-the-art image captioning models.

引用

页数：9

共 50 条

[1] Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in Hindi
Mishra, Santosh Kumar
Sinha, Sushant
Saha, Sriparna
Bhattacharyya, Pushpak
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
[2] Deep Hierarchical Encoder-Decoder Network for Image Captioning
Xiao, Xinyu
Wang, Lingfeng
Ding, Kun
Xiang, Shiming
Pan, Chunhong
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2942 - 2956
[3] Image Captioning: From Encoder-Decoder to Reinforcement Learning
Tang, Yu
[J]. 2022 6TH INTERNATIONAL CONFERENCE ON IMAGING, SIGNAL PROCESSING AND COMMUNICATIONS, ICISPC, 2022, : 6 - 10
[4] An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi
Mishra, Santosh Kumar
Peethala, Mahesh Babu
Saha, Sriparna
Bhattacharyya, Pushpak
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 3019 - 3024
[5] The Optimal Choice of the Encoder-Decoder Model Components for Image Captioning
Bartosiewicz, Mateusz
Iwanowski, Marcin
[J]. INFORMATION, 2024, 15 (08)
[6] Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
Yang, Xu
Gao, Chongyang
Zhang, Hanwang
Cai, Jianfei
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4181 - 4189
[7] MICER: a pre-trained encoder-decoder architecture for molecular image captioning
Yi, Jiacai
Wu, Chengkun
Zhang, Xiaochen
Xiao, Xinyi
Qiu, Yanlong
Zhao, Wentao
Hou, Tingjun
Cao, Dongsheng
[J]. BIOINFORMATICS, 2022, 38 (19) : 4562 - 4572
[8] Efficient Channel Attention Based Encoder-Decoder Approach for Image Captioning in Hindi
Mishra, Santosh Kumar
Rai, Gaurav
Saha, Sriparna
Bhattacharyya, Pushpak
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
[9] An encoder-decoder based framework for hindi image caption generation
Singh, Alok
Singh, Thoudam Doren
Bandyopadhyay, Sivaji
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
[10] An encoder-decoder based framework for hindi image caption generation
Alok Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
[J]. Multimedia Tools and Applications, 2021, 80 : 35721 - 35740

← 1 2 3 4 5 →