A Deep Attention based Framework for Image Caption Generation in Hindi Language

被引:0
|
作者
Dhir, Rijul [1 ]
Mishra, Santosh Kumar [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol Patna, Patna, Bihar, India
来源
COMPUTACION Y SISTEMAS | 2019年 / 23卷 / 03期
关键词
Image captioning; Hindi language; convolutional neural network; recurrent neural network; gated recurrent unit; attention mechanism;
D O I
10.13053/CyS-23-3-3269
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning refers to the process of generating a textual description for an image which defines the object and activity within the image. It is an intersection of computer vision and natural language processing where computer vision is used to understand the content of an image and language modelling from natural language processing is used to convert an image into words in the right order. A large number of works exist for generating image captioning in English language, but no work exists for generating image captioning in Hindi language. Hindi is the official language of India, and it is the fourth most-spoken language in the world, after Mandarin, Spanish and English. The current paper attempts to bridge this gap. Here an attention-based novel architecture for generating image captioning in Hindi language is proposed. Convolution neural network is used as an encoder to extract features from an input image and gated recurrent unit based neural network is used as a decoder to perform language modelling up to the word level. In between, we have used the attention mechanism which helps the decoder to look into the important portions of the image. In order to show the efficacy of the proposed model, we have first created a manually annotated image captioning training corpus in Hindi corresponding to popular MS COCO English dataset having around 80000 images. Experimental results show that our proposed model attains a BLEU1 score of 0.5706 on this data set.
引用
收藏
页码:693 / 701
页数:9
相关论文
共 50 条
  • [1] A Hindi Image Caption Generation Framework Using Deep Learning
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [2] An encoder-decoder based framework for hindi image caption generation
    Alok Singh
    Thoudam Doren Singh
    Sivaji Bandyopadhyay
    [J]. Multimedia Tools and Applications, 2021, 80 : 35721 - 35740
  • [3] An encoder-decoder based framework for hindi image caption generation
    Singh, Alok
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
  • [4] Attention based sequence-to-sequence framework for auto image caption generation
    Khan, Rashid
    Islam, M. Shujah
    Kanwal, Khadija
    Iqbal, Mansoor
    Hossain, Md Imran
    Ye, Zhongfu
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (01) : 159 - 170
  • [5] Bahdanau Attention Based Bengali Image Caption Generation
    Alam, Md Sahrial
    Rahman, Md Sayedur
    Hosen, Md Ikbal
    Mubin, Khairul Anam
    Hossen, Sharif
    Mridha, M. F.
    [J]. 2022 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATIONS (DASA), 2022, : 1073 - 1077
  • [6] Automatic image caption generation using deep learning and multimodal attention
    Dai, Jin
    Zhang, Xinyu
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [7] Automatic Generation of Image Caption Based on Semantic Relation using Deep Visual Attention Prediction
    El-gayar, M. M.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 105 - 114
  • [8] Image caption generation method based on adaptive attention mechanism
    Jin, Huazhong
    Wu, Yu
    Wan, Fang
    Hu, Man
    Li, Qingqing
    [J]. MIPPR 2019: PATTERN RECOGNITION AND COMPUTER VISION, 2020, 11430
  • [9] Image caption generation with dual attention mechanism
    Liu, Maofu
    Li, Lingjun
    Hu, Huijun
    Guan, Weili
    Tian, Jing
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (02)
  • [10] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017