Parallel encoder-decoder framework for image captioning

被引：3

作者：

Saeidimesineh, Reyhane ^{[1
]}

Adibi, Peyman ^{[1
]}

Karshenas, Hossein ^{[1
]}

Darvishy, Alireza ^{[2
]}

机构：

[1] Univ Isfahan, Fac Comp Engn, Artificial Intelligence Dept, Esfahan, Iran

[2] Zurich Univ Appl Sci ZHAW, Sch Engn, Zurich, Switzerland

来源：

KNOWLEDGE-BASED SYSTEMS | 2023年 / 282卷

关键词：

Parallelization; Encoder-decoder framework; Image captioning; Natural language processing; REPRESENTATION; TRANSFORMER;

D O I：

10.1016/j.knosys.2023.111056

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress in deep learning has led to successful utilization of encoder-decoder frameworks inspired by machine translation in image captioning models. The stacking of layers in encoders and decoders has made it possible to use several modules in encoders and decoders. However, just one type of module in encoder or decoder has been used in stacked models. In this research, we propose a parallel encoder-decoder framework that aims to take advantage of multiple of types modules in encoders and decoders, simultaneously. This framework contains augmented parallel blocks, which include stacking modules or non-stacked ones. Then, the results of the blocks are integrated to extract higher-level semantic concepts. This general idea is not limited to image captioning and can be customized for many applications that utilize encoder-decoder frameworks. We evaluated our proposed method on the MS-COCO dataset and achieved state-of-the-art results. We got 149.92 for CIDEr-D metric outperforming state-of-the-art image captioning models.

引用

页数：9

共 50 条

[21] Transformer with a Parallel Decoder for Image Captioning
Wei, Peilang
Liu, Xu
Luo, Jun
Pu, Huayan
Huang, Xiaoxu
Wang, Shilong
Cao, Huajun
Yang, Shouhong
Zhuang, Xu
Wang, Jason
Yue, Hong
Ji, Cheng
Zhou, Mingliang
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
[22] Image Captioning using Vision Encoder Decoder Model
Abdelaal, Ahmad
ELshafey, Nadeen Farid
Abdalah, Nadine Walid
Shaaban, Nouran Hady
Okasha, Sama Ahmed
Yasser, Tawfik
Fathi, Mostafa
Fouad, Khaled M.
Abdelbaky, Ibrahim
[J]. 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND SMART INNOVATION, ICMISI 2024, 2024, : 101 - 106
[23] Dense Video Captioning with Hierarchical Attention-Based Encoder-Decoder Networks
Yu, Mingjing
Zheng, Huicheng
Liu, Zehua
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[24] SPEECH-TO-SINGING CONVERSION IN AN ENCODER-DECODER FRAMEWORK
Parekh, Jayneel
Rao, Preeti
Yang, Yi-Hsuan
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 261 - 265
[25] An Improved Encoder-Decoder Network for Ore Image Segmentation
Yang, Hao
Huang, Chao
Wang, Long
Luo, Xiong
[J]. IEEE SENSORS JOURNAL, 2021, 21 (10) : 11469 - 11475
[26] Image Segmentation Using Encoder-Decoder with Deformable Convolutions
Gurita, Andreea
Mocanu, Irina Georgiana
[J]. SENSORS, 2021, 21 (05) : 1 - 27
[27] Image Compression with Encoder-Decoder Matched Semantic Segmentation
Hoang, Trinh Man
Zhou, Jinjia
Fan, Yibo
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 619 - 623
[28] Using Convolutional Encoder-Decoder for Document Image Binarization
Peng, Xujun
Cao, Huaigu
Natarajan, Prem
[J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 708 - 713
[29] An effective global learning framework for hyperspectral image classification based on encoder-decoder architecture
Dang, Lanxue
Liu, Chongyang
Dong, Weichuan
Hou, Yane
Ge, Qiang
Liu, Yang
[J]. INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2022, 15 (01) : 1350 - 1376
[30] Controllable image caption with an encoder-decoder optimization structure
Shao, Jie
Yang, Runxia
[J]. APPLIED INTELLIGENCE, 2022, 52 (10) : 11382 - 11393

← 1 2 3 4 5 →