Efficient Channel Attention Based Encoder-Decoder Approach for Image Captioning in Hindi

被引:0
|
作者
Mishra, Santosh Kumar [1 ]
Rai, Gaurav [2 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [3 ]
机构
[1] Indian Inst Technol, Patna 801106, Bihar, India
[2] Natl Inst Technol, Patna 800005, Bihar, India
[3] Indian Inst Technol, Mumbai 400076, Maharashtra, India
关键词
Image captioning; channel attention; deep-learning; attention; Hindi;
D O I
10.1145/3483597
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning refers to the process of generating a textual description that describes objects and activities present in a given image. It connects two fields of artificial intelligence, computer vision, and natural language processing. Computer vision and natural language processing deal with image understanding and language modeling, respectively. In the existing literature, most of the works have been carried out for image captioning in the English language. This article presents a novel method for image captioning in the Hindi language using encoder-decoder based deep learning architecture with efficient channel attention. The key contribution of this work is the deployment of an efficient channel attention mechanism with bahdanau attention and a gated recurrent unit for developing an image captioning model in the Hindi language. Color images usually consist of three channels, namely red, green, and blue. The channel attention mechanism focuses on an image's important channel while performing the convolution, which is basically to assign higher importance to specific channels over others. The channel attention mechanism has been shown to have great potential for improving the efficiency of deep convolution neural networks (CNNs). The proposed encoder-decoder architecture utilizes the recently introduced ECA-NET CNN to integrate the channel attention mechanism. Hindi is the fourth most spoken language globally, widely spoken in India and South Asia; it is India's official language. By translating the well-known MSCOCO dataset from English to Hindi, a dataset for image captioning in Hindi is manually created. The efficiency of the proposed method is compared with other baselines in terms of Bilingual Evaluation Understudy (BLEU) scores, and the results obtained illustrate that the method proposed outperforms other baselines. The proposed method has attained improvements of 0.59%, 2.51%, 4.38%, and 3.30% in terms of BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores, respectively, with respect to the state-of-the-art. Qualities of the generated captions are further assessed manually in terms of adequacy and fluency to illustrate the proposed method's efficacy.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] An Information Multiplexed Encoder-Decoder Network for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Peethala, Mahesh Babu
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 3019 - 3024
  • [2] Dynamic Convolution-based Encoder-Decoder Framework for Image Captioning in Hindi
    Mishra, Santosh Kumar
    Sinha, Sushant
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [3] Parallel encoder-decoder framework for image captioning
    Saeidimesineh, Reyhane
    Adibi, Peyman
    Karshenas, Hossein
    Darvishy, Alireza
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 282
  • [4] An encoder-decoder based framework for hindi image caption generation
    Singh, Alok
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
  • [5] An encoder-decoder based framework for hindi image caption generation
    Alok Singh
    Thoudam Doren Singh
    Sivaji Bandyopadhyay
    [J]. Multimedia Tools and Applications, 2021, 80 : 35721 - 35740
  • [6] Deep Hierarchical Encoder-Decoder Network for Image Captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2942 - 2956
  • [7] Image Captioning: From Encoder-Decoder to Reinforcement Learning
    Tang, Yu
    [J]. 2022 6TH INTERNATIONAL CONFERENCE ON IMAGING, SIGNAL PROCESSING AND COMMUNICATIONS, ICISPC, 2022, : 6 - 10
  • [8] Dense Video Captioning with Hierarchical Attention-Based Encoder-Decoder Networks
    Yu, Mingjing
    Zheng, Huicheng
    Liu, Zehua
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] The Optimal Choice of the Encoder-Decoder Model Components for Image Captioning
    Bartosiewicz, Mateusz
    Iwanowski, Marcin
    [J]. INFORMATION, 2024, 15 (08)
  • [10] Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
    Yang, Xu
    Gao, Chongyang
    Zhang, Hanwang
    Cai, Jianfei
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4181 - 4189