A Deep Attention based Framework for Image Caption Generation in Hindi Language

被引:0
|
作者
Dhir, Rijul [1 ]
Mishra, Santosh Kumar [1 ]
Saha, Sriparna [1 ]
Bhattacharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol Patna, Patna, Bihar, India
来源
COMPUTACION Y SISTEMAS | 2019年 / 23卷 / 03期
关键词
Image captioning; Hindi language; convolutional neural network; recurrent neural network; gated recurrent unit; attention mechanism;
D O I
10.13053/CyS-23-3-3269
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image captioning refers to the process of generating a textual description for an image which defines the object and activity within the image. It is an intersection of computer vision and natural language processing where computer vision is used to understand the content of an image and language modelling from natural language processing is used to convert an image into words in the right order. A large number of works exist for generating image captioning in English language, but no work exists for generating image captioning in Hindi language. Hindi is the official language of India, and it is the fourth most-spoken language in the world, after Mandarin, Spanish and English. The current paper attempts to bridge this gap. Here an attention-based novel architecture for generating image captioning in Hindi language is proposed. Convolution neural network is used as an encoder to extract features from an input image and gated recurrent unit based neural network is used as a decoder to perform language modelling up to the word level. In between, we have used the attention mechanism which helps the decoder to look into the important portions of the image. In order to show the efficacy of the proposed model, we have first created a manually annotated image captioning training corpus in Hindi corresponding to popular MS COCO English dataset having around 80000 images. Experimental results show that our proposed model attains a BLEU1 score of 0.5706 on this data set.
引用
收藏
页码:693 / 701
页数:9
相关论文
共 50 条
  • [31] Assamese news image caption generation using attention mechanism
    Ringki Das
    Thoudam Doren Singh
    [J]. Multimedia Tools and Applications, 2022, 81 : 10051 - 10069
  • [32] Attention based video captioning framework for Hindi
    Singh, Alok
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    [J]. MULTIMEDIA SYSTEMS, 2022, 28 (01) : 195 - 207
  • [33] Attention based video captioning framework for Hindi
    Alok Singh
    Thoudam Doren Singh
    Sivaji Bandyopadhyay
    [J]. Multimedia Systems, 2022, 28 : 195 - 207
  • [34] Chinese Image Caption Based on Deep Learning
    Luo, Ziyue
    Kang, Huixian
    Yao, Pin
    Wan, Wanggen
    [J]. 2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 216 - 220
  • [35] Middle-Level Attribute-Based Language Retouching for Image Caption Generation
    Guan, Zhibin
    Liu, Kang
    Ma, Yan
    Qian, Xu
    Ji, Tongkai
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (10):
  • [36] Research for image caption based on global attention mechanism
    Tong, Wu
    Tao, Ku
    Hao, Zhang
    [J]. SECOND TARGET RECOGNITION AND ARTIFICIAL INTELLIGENCE SUMMIT FORUM, 2020, 11427
  • [37] Topic-Based Image Caption Generation
    Dash, Sandeep Kumar
    Acharya, Shantanu
    Pakray, Partha
    Das, Ranjita
    Gelbukh, Alexander
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 3025 - 3034
  • [38] Topic-Based Image Caption Generation
    Sandeep Kumar Dash
    Shantanu Acharya
    Partha Pakray
    Ranjita Das
    Alexander Gelbukh
    [J]. Arabian Journal for Science and Engineering, 2020, 45 : 3025 - 3034
  • [39] Multilevel Attention Networks and Policy Reinforcement Learning for Image Caption Generation
    Zhou, Zhibo
    Zhang, Xiaoming
    Li, Zhoujun
    Huang, Feiran
    Xu, Jie
    [J]. BIG DATA, 2022, 10 (06) : 481 - 492
  • [40] Visual Attention Based on Long-Short Term Memory Model for Image Caption Generation
    Qu, Shiru
    Xi, Yuling
    Ding, Songtao
    [J]. 2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 4789 - 4794