Parallel encoder-decoder framework for image captioning

被引:3
|
作者
Saeidimesineh, Reyhane [1 ]
Adibi, Peyman [1 ]
Karshenas, Hossein [1 ]
Darvishy, Alireza [2 ]
机构
[1] Univ Isfahan, Fac Comp Engn, Artificial Intelligence Dept, Esfahan, Iran
[2] Zurich Univ Appl Sci ZHAW, Sch Engn, Zurich, Switzerland
关键词
Parallelization; Encoder-decoder framework; Image captioning; Natural language processing; REPRESENTATION; TRANSFORMER;
D O I
10.1016/j.knosys.2023.111056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress in deep learning has led to successful utilization of encoder-decoder frameworks inspired by machine translation in image captioning models. The stacking of layers in encoders and decoders has made it possible to use several modules in encoders and decoders. However, just one type of module in encoder or decoder has been used in stacked models. In this research, we propose a parallel encoder-decoder framework that aims to take advantage of multiple of types modules in encoders and decoders, simultaneously. This framework contains augmented parallel blocks, which include stacking modules or non-stacked ones. Then, the results of the blocks are integrated to extract higher-level semantic concepts. This general idea is not limited to image captioning and can be customized for many applications that utilize encoder-decoder frameworks. We evaluated our proposed method on the MS-COCO dataset and achieved state-of-the-art results. We got 149.92 for CIDEr-D metric outperforming state-of-the-art image captioning models.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Transformer with a Parallel Decoder for Image Captioning
    Wei, Peilang
    Liu, Xu
    Luo, Jun
    Pu, Huayan
    Huang, Xiaoxu
    Wang, Shilong
    Cao, Huajun
    Yang, Shouhong
    Zhuang, Xu
    Wang, Jason
    Yue, Hong
    Ji, Cheng
    Zhou, Mingliang
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [22] Image Captioning using Vision Encoder Decoder Model
    Abdelaal, Ahmad
    ELshafey, Nadeen Farid
    Abdalah, Nadine Walid
    Shaaban, Nouran Hady
    Okasha, Sama Ahmed
    Yasser, Tawfik
    Fathi, Mostafa
    Fouad, Khaled M.
    Abdelbaky, Ibrahim
    [J]. 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND SMART INNOVATION, ICMISI 2024, 2024, : 101 - 106
  • [23] Dense Video Captioning with Hierarchical Attention-Based Encoder-Decoder Networks
    Yu, Mingjing
    Zheng, Huicheng
    Liu, Zehua
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [24] SPEECH-TO-SINGING CONVERSION IN AN ENCODER-DECODER FRAMEWORK
    Parekh, Jayneel
    Rao, Preeti
    Yang, Yi-Hsuan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 261 - 265
  • [25] An Improved Encoder-Decoder Network for Ore Image Segmentation
    Yang, Hao
    Huang, Chao
    Wang, Long
    Luo, Xiong
    [J]. IEEE SENSORS JOURNAL, 2021, 21 (10) : 11469 - 11475
  • [26] Image Segmentation Using Encoder-Decoder with Deformable Convolutions
    Gurita, Andreea
    Mocanu, Irina Georgiana
    [J]. SENSORS, 2021, 21 (05) : 1 - 27
  • [27] Image Compression with Encoder-Decoder Matched Semantic Segmentation
    Hoang, Trinh Man
    Zhou, Jinjia
    Fan, Yibo
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 619 - 623
  • [28] Using Convolutional Encoder-Decoder for Document Image Binarization
    Peng, Xujun
    Cao, Huaigu
    Natarajan, Prem
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 708 - 713
  • [29] An effective global learning framework for hyperspectral image classification based on encoder-decoder architecture
    Dang, Lanxue
    Liu, Chongyang
    Dong, Weichuan
    Hou, Yane
    Ge, Qiang
    Liu, Yang
    [J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2022, 15 (01) : 1350 - 1376
  • [30] Controllable image caption with an encoder-decoder optimization structure
    Shao, Jie
    Yang, Runxia
    [J]. APPLIED INTELLIGENCE, 2022, 52 (10) : 11382 - 11393