Parallel Image Captioning Using 2D Masked Convolution

被引:3
|
作者
Poleak, Chanrith [1 ]
Kwon, Jangwoo [1 ]
机构
[1] Inha Univ, Dept Comp Engn, Incheon 402751, South Korea
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 09期
关键词
computer vision; convolutional networks; LSTM;
D O I
10.3390/app9091871
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Dynamically Reconfigurable Parallel Architecture Implementation of 2D Convolution for Image Processing over FPGA
    Jahiruzzaman, Md.
    Saha, Shumit
    Hawlader, Md. Abul Khayum
    [J]. 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION COMMUNICATION TECHNOLOGY (ICEEICT 2015), 2015,
  • [2] Image Captioning with Masked Diffusion Model
    Tian, Weidong
    Xu, Wenzheng
    Zhao, Junxiang
    Zhao, Zhongqiu
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VIII, ICIC 2024, 2024, 14869 : 216 - 227
  • [4] Fast 2D convolution using reconfigurable computing
    Wong, SC
    Jasiunas, M
    Kearney, D
    [J]. ISSPA 2005: The 8th International Symposium on Signal Processing and its Applications, Vols 1 and 2, Proceedings, 2005, : 791 - 794
  • [5] 2D AND 3D OPTIMAL PARALLEL IMAGE WARPING
    WITTENBRINK, CM
    SOMANI, AK
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 25 (02) : 197 - 208
  • [6] Transformer with a Parallel Decoder for Image Captioning
    Wei, Peilang
    Liu, Xu
    Luo, Jun
    Pu, Huayan
    Huang, Xiaoxu
    Wang, Shilong
    Cao, Huajun
    Yang, Shouhong
    Zhuang, Xu
    Wang, Jason
    Yue, Hong
    Ji, Cheng
    Zhou, Mingliang
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [7] 2D bit-parallel flexible optical interconnects using fiber image guides
    Li, Y
    Wang, T
    [J]. HOLOGRAPHIC OPTICAL ELEMENTS AND DISPLAYS, 1996, 2885 : 100 - 111
  • [8] Fast and Robust 2D Minkowski Sum Using Reduced Convolution
    Behar, Evan
    Lien, Jyh-Ming
    [J]. 2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011, : 1573 - 1578
  • [9] Noisy Phoneme Recognition Using 2D Convolution Neural Network
    Ramonaite, Justina
    Korvel, Grazina
    [J]. 2023 IEEE 10TH JUBILEE WORKSHOP ON ADVANCES IN INFORMATION, ELECTRONIC AND ELECTRICAL ENGINEERING, AIEEE, 2023,
  • [10] A load-balanced parallel algorithm for 2D image warping
    Jiang, YH
    Chang, ZM
    Yang, XJ
    [J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2004, 3358 : 735 - 745