Parallel encoder-decoder framework for image captioning

被引:3
|
作者
Saeidimesineh, Reyhane [1 ]
Adibi, Peyman [1 ]
Karshenas, Hossein [1 ]
Darvishy, Alireza [2 ]
机构
[1] Univ Isfahan, Fac Comp Engn, Artificial Intelligence Dept, Esfahan, Iran
[2] Zurich Univ Appl Sci ZHAW, Sch Engn, Zurich, Switzerland
关键词
Parallelization; Encoder-decoder framework; Image captioning; Natural language processing; REPRESENTATION; TRANSFORMER;
D O I
10.1016/j.knosys.2023.111056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress in deep learning has led to successful utilization of encoder-decoder frameworks inspired by machine translation in image captioning models. The stacking of layers in encoders and decoders has made it possible to use several modules in encoders and decoders. However, just one type of module in encoder or decoder has been used in stacked models. In this research, we propose a parallel encoder-decoder framework that aims to take advantage of multiple of types modules in encoders and decoders, simultaneously. This framework contains augmented parallel blocks, which include stacking modules or non-stacked ones. Then, the results of the blocks are integrated to extract higher-level semantic concepts. This general idea is not limited to image captioning and can be customized for many applications that utilize encoder-decoder frameworks. We evaluated our proposed method on the MS-COCO dataset and achieved state-of-the-art results. We got 149.92 for CIDEr-D metric outperforming state-of-the-art image captioning models.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Sinha, Sushant
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [2] Deep Hierarchical Encoder-Decoder Network for Image Captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2942 - 2956
  • [3] Image Captioning: From Encoder-Decoder to Reinforcement Learning
    Tang, Yu
    [J]. 2022 6TH INTERNATIONAL CONFERENCE ON IMAGING, SIGNAL PROCESSING AND COMMUNICATIONS, ICISPC, 2022, : 6 - 10
  • [4] An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Peethala, Mahesh Babu
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 3019 - 3024
  • [5] The Optimal Choice of the Encoder-Decoder Model Components for Image Captioning
    Bartosiewicz, Mateusz
    Iwanowski, Marcin
    [J]. INFORMATION, 2024, 15 (08)
  • [6] Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
    Yang, Xu
    Gao, Chongyang
    Zhang, Hanwang
    Cai, Jianfei
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4181 - 4189
  • [7] MICER: a pre-trained encoder-decoder architecture for molecular image captioning
    Yi, Jiacai
    Wu, Chengkun
    Zhang, Xiaochen
    Xiao, Xinyi
    Qiu, Yanlong
    Zhao, Wentao
    Hou, Tingjun
    Cao, Dongsheng
    [J]. BIOINFORMATICS, 2022, 38 (19) : 4562 - 4572
  • [8] Efficient Channel Attention Based Encoder-Decoder Approach for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Rai, Gaurav
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [9] An encoder-decoder based framework for hindi image caption generation
    Singh, Alok
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
  • [10] An encoder-decoder based framework for hindi image caption generation
    Alok Singh
    Thoudam Doren Singh
    Sivaji Bandyopadhyay
    [J]. Multimedia Tools and Applications, 2021, 80 : 35721 - 35740