An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi

被引:3
|
作者
Mishra, Santosh Kumar [1 ]
Peethala, Mahesh Babu [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [2 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna, Bihar, India
[2] Indian Inst Technol, Mumbai, Maharashtra, India
关键词
D O I
10.1109/SMC52423.2021.9658859
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning is a multi-modal problem linking computer vision and natural language processing, which combines image analysis and text generation challenges. In the literature, most of the image captioning works have been accomplished in the English language only. This paper proposes a new approach for image captioning in the Hindi language using deep learning-based encoder-decoder architecture. Hindi, widely spoken in India and South Asia, is the fourth most spoken language globally; it is India's official language. In recent years, significant advancement has been made in image captioning, utilizing encoder-decoder architectures based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Encoder CNN extracts features from input images, whereas decoder RNN performs language modeling. The proposed encoder-decoder architecture utilizes information multiplexing in the encoder CNN to achieve a performance gain in feature extraction. Extensive experimentation is carried out on the benchmark MSCOCO Hindi dataset, and significant improvements in BLEU score are reported compared to the baselines. Manual human evaluation in terms of adequacy and fluency of the generated captions further establishes the proposed method's efficacy in generating good quality captions.
引用
收藏
页码:3019 / 3024
页数:6
相关论文
共 50 条
  • [11] Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning
    Gui, Yuling
    Guo, Dan
    Zhao, Ye
    PROCEEDINGS OF THE 2ND WORKSHOP ON MULTIMEDIA FOR ACCESSIBLE HUMAN COMPUTER INTERFACES (MAHCI '19), 2019, : 25 - 32
  • [12] MICER: a pre-trained encoder-decoder architecture for molecular image captioning
    Yi, Jiacai
    Wu, Chengkun
    Zhang, Xiaochen
    Xiao, Xinyi
    Qiu, Yanlong
    Zhao, Wentao
    Hou, Tingjun
    Cao, Dongsheng
    BIOINFORMATICS, 2022, 38 (19) : 4562 - 4572
  • [13] An Improved Encoder-Decoder Network for Ore Image Segmentation
    Yang, Hao
    Huang, Chao
    Wang, Long
    Luo, Xiong
    IEEE SENSORS JOURNAL, 2021, 21 (10) : 11469 - 11475
  • [14] A comprehensive construction of deep neural network-based encoder-decoder framework for automatic image captioning systems
    Rahman, Md Mijanur
    Uzzaman, Ashik
    Sami, Sadia Islam
    Khatun, Fatema
    Bhuiyan, Md Al-Amin
    IET IMAGE PROCESSING, 2024, 18 (14) : 4778 - 4798
  • [15] Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning
    Ramos, Rita
    Martins, Bruno
    IEEE ACCESS, 2022, 10 : 24852 - 24863
  • [16] Whole Image Synthesis Using a Deep Encoder-Decoder Network
    Sevetlidis, Vasileios
    Giuffrida, Mario Valerio
    Tsaftaris, Sotirios A.
    SIMULATION AND SYNTHESIS IN MEDICAL IMAGING, SASHIMI 2016, 2016, 9968 : 127 - 137
  • [17] VISIBLE AND INFRARED IMAGE FUSION USING ENCODER-DECODER NETWORK
    Ataman, Ferhat Can
    Bozdagi Akar, Gozde
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1779 - 1783
  • [18] Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [19] Empirical autopsy of deep video captioning encoder-decoder architecture
    Aafaq, Nayyer
    Akhtar, Naveed
    Liu, Wei
    Mian, Ajmal
    ARRAY, 2021, 9
  • [20] Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8167 - 8174