An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi

被引:3
|
作者
Mishra, Santosh Kumar [1 ]
Peethala, Mahesh Babu [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [2 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna, Bihar, India
[2] Indian Inst Technol, Mumbai, Maharashtra, India
关键词
D O I
10.1109/SMC52423.2021.9658859
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning is a multi-modal problem linking computer vision and natural language processing, which combines image analysis and text generation challenges. In the literature, most of the image captioning works have been accomplished in the English language only. This paper proposes a new approach for image captioning in the Hindi language using deep learning-based encoder-decoder architecture. Hindi, widely spoken in India and South Asia, is the fourth most spoken language globally; it is India's official language. In recent years, significant advancement has been made in image captioning, utilizing encoder-decoder architectures based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Encoder CNN extracts features from input images, whereas decoder RNN performs language modeling. The proposed encoder-decoder architecture utilizes information multiplexing in the encoder CNN to achieve a performance gain in feature extraction. Extensive experimentation is carried out on the benchmark MSCOCO Hindi dataset, and significant improvements in BLEU score are reported compared to the baselines. Manual human evaluation in terms of adequacy and fluency of the generated captions further establishes the proposed method's efficacy in generating good quality captions.
引用
收藏
页码:3019 / 3024
页数:6
相关论文
共 50 条
  • [21] Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning
    Lee, Hojun
    Cho, Hyunjun
    Park, Jieun
    Chae, Jinyeong
    Kim, Jihie
    SENSORS, 2022, 22 (04)
  • [22] Image Captioning Encoder-Decoder Models Using CNN-RNN Architectures: A Comparative Study
    Suresh, K. Revati
    Jarapala, Arun
    Sudeep, P., V
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (10) : 5719 - 5742
  • [23] A Systematic Literature Review on Using the Encoder-Decoder Models for Image Captioning in English and Arabic Languages
    Alsayed, Ashwaq
    Arif, Muhammad
    Qadah, Thamir M.
    Alotaibi, Saud
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [24] Two-stream encoder-decoder network for localizing image forgeries
    Mazumdar, Aniruddha
    Bora, Prabin Kumar
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 82
  • [25] Attentive U-recurrent encoder-decoder network for image dehazing
    Yin, Shibai
    Wang, Yibin
    Yang, Yee-Hong
    NEUROCOMPUTING, 2021, 437 : 143 - 156
  • [26] RESIDUAL ENCODER-DECODER NETWORK INTRODUCED FOR MULTISOURCE SAR IMAGE DESPECKLING
    Gu, Feng
    Zhang, Hong
    Wang, Chao
    Zhang, Bo
    PROCEEDINGS OF 2017 SAR IN BIG DATA ERA: MODELS, METHODS AND APPLICATIONS (BIGSARDATA), 2017,
  • [27] Underwater Image Enhancement Using Encoder-Decoder Scale Attention Network
    Lee, Ka-Ki
    Hsieh, Jun-Wei
    Hsieh, Yi-Kuan
    Hsieh, An-Ting
    2024 6TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND THE INTERNET, ICCCI 2024, 2024, : 101 - 106
  • [28] Iterative Deep Convolutional Encoder-Decoder Network for Medical Image Segmentation
    Kim, Jung Uk
    Kim, Hak Gu
    Ro, Yong Man
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 685 - 688
  • [29] Image Denoising Using a Deep Encoder-Decoder Network with Skip Connections
    Couturier, Raphael
    Perrot, Gilles
    Salomon, Michel
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT VI, 2018, 11306 : 554 - 565
  • [30] A Method of CT Image Denoising Based on Residual Encoder-Decoder Network
    Liu, Yali
    JOURNAL OF HEALTHCARE ENGINEERING, 2021, 2021 : 2384493