An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi

被引:2
|
作者
Mishra, Santosh Kumar [1 ]
Peethala, Mahesh Babu [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [2 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna, Bihar, India
[2] Indian Inst Technol, Mumbai, Maharashtra, India
关键词
D O I
10.1109/SMC52423.2021.9658859
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning is a multi-modal problem linking computer vision and natural language processing, which combines image analysis and text generation challenges. In the literature, most of the image captioning works have been accomplished in the English language only. This paper proposes a new approach for image captioning in the Hindi language using deep learning-based encoder-decoder architecture. Hindi, widely spoken in India and South Asia, is the fourth most spoken language globally; it is India's official language. In recent years, significant advancement has been made in image captioning, utilizing encoder-decoder architectures based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs). Encoder CNN extracts features from input images, whereas decoder RNN performs language modeling. The proposed encoder-decoder architecture utilizes information multiplexing in the encoder CNN to achieve a performance gain in feature extraction. Extensive experimentation is carried out on the benchmark MSCOCO Hindi dataset, and significant improvements in BLEU score are reported compared to the baselines. Manual human evaluation in terms of adequacy and fluency of the generated captions further establishes the proposed method's efficacy in generating good quality captions.
引用
收藏
页码:3019 / 3024
页数:6
相关论文
共 50 条
  • [1] Deep Hierarchical Encoder-Decoder Network for Image Captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2942 - 2956
  • [2] Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Sinha, Sushant
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [3] Efficient Channel Attention Based Encoder-Decoder Approach for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Rai, Gaurav
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [4] Parallel encoder-decoder framework for image captioning
    Saeidimesineh, Reyhane
    Adibi, Peyman
    Karshenas, Hossein
    Darvishy, Alireza
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [5] Image Captioning: From Encoder-Decoder to Reinforcement Learning
    Tang, Yu
    [J]. 2022 6TH INTERNATIONAL CONFERENCE ON IMAGING, SIGNAL PROCESSING AND COMMUNICATIONS, ICISPC, 2022, : 6 - 10
  • [6] An encoder-decoder based framework for hindi image caption generation
    Singh, Alok
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
  • [7] The Optimal Choice of the Encoder-Decoder Model Components for Image Captioning
    Bartosiewicz, Mateusz
    Iwanowski, Marcin
    [J]. INFORMATION, 2024, 15 (08)
  • [8] An encoder-decoder based framework for hindi image caption generation
    Alok Singh
    Thoudam Doren Singh
    Sivaji Bandyopadhyay
    [J]. Multimedia Tools and Applications, 2021, 80 : 35721 - 35740
  • [9] Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
    Yang, Xu
    Gao, Chongyang
    Zhang, Hanwang
    Cai, Jianfei
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4181 - 4189
  • [10] Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning
    Gui, Yuling
    Guo, Dan
    Zhao, Ye
    [J]. PROCEEDINGS OF THE 2ND WORKSHOP ON MULTIMEDIA FOR ACCESSIBLE HUMAN COMPUTER INTERFACES (MAHCI '19), 2019, : 25 - 32