A Deep Attention based Framework for Image Caption Generation in Hindi Language

被引：0

作者：

Dhir, Rijul ^{[1
]}

Mishra, Santosh Kumar ^{[1
]}

Saha, Sriparna ^{[1
]}

Bhattacharyya, Pushpak ^{[1
]}

机构：

[1] Indian Inst Technol Patna, Patna, Bihar, India

来源：

COMPUTACION Y SISTEMAS | 2019年 / 23卷 / 03期

关键词：

Image captioning; Hindi language; convolutional neural network; recurrent neural network; gated recurrent unit; attention mechanism;

D O I：

10.13053/CyS-23-3-3269

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image captioning refers to the process of generating a textual description for an image which defines the object and activity within the image. It is an intersection of computer vision and natural language processing where computer vision is used to understand the content of an image and language modelling from natural language processing is used to convert an image into words in the right order. A large number of works exist for generating image captioning in English language, but no work exists for generating image captioning in Hindi language. Hindi is the official language of India, and it is the fourth most-spoken language in the world, after Mandarin, Spanish and English. The current paper attempts to bridge this gap. Here an attention-based novel architecture for generating image captioning in Hindi language is proposed. Convolution neural network is used as an encoder to extract features from an input image and gated recurrent unit based neural network is used as a decoder to perform language modelling up to the word level. In between, we have used the attention mechanism which helps the decoder to look into the important portions of the image. In order to show the efficacy of the proposed model, we have first created a manually annotated image captioning training corpus in Hindi corresponding to popular MS COCO English dataset having around 80000 images. Experimental results show that our proposed model attains a BLEU1 score of 0.5706 on this data set.

引用

页码：693 / 701

页数：9

共 50 条

[31] Assamese news image caption generation using attention mechanism
Ringki Das
Thoudam Doren Singh
[J]. Multimedia Tools and Applications, 2022, 81 : 10051 - 10069
[32] Attention based video captioning framework for Hindi
Singh, Alok
Singh, Thoudam Doren
Bandyopadhyay, Sivaji
[J]. MULTIMEDIA SYSTEMS, 2022, 28 (01) : 195 - 207
[33] Attention based video captioning framework for Hindi
Alok Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
[J]. Multimedia Systems, 2022, 28 : 195 - 207
[34] Chinese Image Caption Based on Deep Learning
Luo, Ziyue
Kang, Huixian
Yao, Pin
Wan, Wanggen
[J]. 2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 216 - 220
[35] Middle-Level Attribute-Based Language Retouching for Image Caption Generation
Guan, Zhibin
Liu, Kang
Ma, Yan
Qian, Xu
Ji, Tongkai
[J]. APPLIED SCIENCES-BASEL, 2018, 8 (10):
[36] Research for image caption based on global attention mechanism
Tong, Wu
Tao, Ku
Hao, Zhang
[J]. SECOND TARGET RECOGNITION AND ARTIFICIAL INTELLIGENCE SUMMIT FORUM, 2020, 11427
[37] Topic-Based Image Caption Generation
Dash, Sandeep Kumar
Acharya, Shantanu
Pakray, Partha
Das, Ranjita
Gelbukh, Alexander
[J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2020, 45 (04) : 3025 - 3034
[38] Topic-Based Image Caption Generation
Sandeep Kumar Dash
Shantanu Acharya
Partha Pakray
Ranjita Das
Alexander Gelbukh
[J]. Arabian Journal for Science and Engineering, 2020, 45 : 3025 - 3034
[39] Multilevel Attention Networks and Policy Reinforcement Learning for Image Caption Generation
Zhou, Zhibo
Zhang, Xiaoming
Li, Zhoujun
Huang, Feiran
Xu, Jie
[J]. BIG DATA, 2022, 10 (06) : 481 - 492
[40] Visual Attention Based on Long-Short Term Memory Model for Image Caption Generation
Qu, Shiru
Xi, Yuling
Ding, Songtao
[J]. 2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 4789 - 4794

← 1 2 3 4 5 →