Recognition of cursive video text using a deep learning framework

被引：6

作者：

Mirza, Ali ^{[1
]}

Siddiqi, Imran ^{[1
]}

机构：

[1] Bahria Univ, Dept Comp Sci, Islamabad, Pakistan

来源：

IET IMAGE PROCESSING | 2020年 / 14卷 / 14期

关键词：

optical character recognition; feature extraction; recurrent neural nets; video signal processing; information retrieval; content-based retrieval; learning (artificial intelligence); text analysis; image segmentation; video retrieval; video frames; News channel videos; character recognition rate; Urdu text; cursive scripts; cursive video text; deep learning; textual content-based retrieval system; text regions; video optical character recognition systems; video text recognition; mature V-OCRs; noncursive scripts; complex ligatures; overlapping ligatures; context-dependent shape variations; cursive caption text; convolutional networks; end-to-end framework; convolutional neural network; feature sequence extraction; bi-directional recurrent neural networks; sequence-to-sequence mapping; text lines extraction; background segmentation; TRAFFIC SIGNS RECOGNITION; CHARACTER-RECOGNITION; SCENE; REPRESENTATION; SEGMENTATION; STROKELETS; FEATURES;

D O I：

10.1049/iet-ipr.2019.1070

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study focuses on cursive text recognition appearing in videos, using a complete framework of deep neural networks. While mature video optical character recognition systems (V-OCRs) are available for text in non-cursive scripts, recognition of cursive scripts is marked by many challenges. These include complex and overlapping ligatures, context-dependent shape variations and presence of a large number of dots and diacritics. The authors present an analytical technique for recognition of cursive caption text that relies on a combination of convolutional and recurrent neural networks trained in an end-to-end framework. Text lines extracted from video frames are preprocessed to segment the background and are fed to a convolutional neural network for feature extraction. The extracted feature sequences are fed to different variants of bi-directional recurrent neural networks along with the ground truth transcription to learn sequence-to-sequence mapping. Finally, a connectionist temporal classification layer is employed to produce the final transcription. Experiments on a data set of more than 40,000 text lines from 11,192 video frames of various News channel videos reported an overall character recognition rate of 97.63%. The proposed work employs Urdu text as a case study but the findings can be generalised to other cursive scripts as well.

引用

页码：3444 / 3455

页数：12

共 50 条

[1] Detection and recognition of cursive text from video frames
Ali Mirza
Ossama Zeshan
Muhammad Atif
Imran Siddiqi
[J]. EURASIP Journal on Image and Video Processing, 2020
[2] Detection and recognition of cursive text from video frames
Mirza, Ali
Zeshan, Ossama
Atif, Muhammad
Siddiqi, Imran
[J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
[3] Impact of Pre-Processing on Recognition of Cursive Video Text
Mirza, Ali
Siddiqi, Imran
Mustufa, Syed Ghulam
Hussain, Mazahir
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PT I, 2020, 11867 : 565 - 576
[4] Doctor's Cursive Handwriting Recognition System Using Deep Learning
Fajardo, Lovely Joy
Sorillo, Nino Joshua
Garlit, Jaycel
Tomines, Cia Dennise
Abisado, Mideth B.
Imperial, Joseph Marvin R.
Rodriguez, Ramon L.
Fabito, Bernie S.
[J]. 2019 IEEE 11TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT, AND MANAGEMENT (HNICEM), 2019,
[5] Handwritten Text Recognition using Deep Learning
Nikitha, A.
Geetha, J.
JayaLakshmi, D. S.
[J]. 2020 5TH IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS ON ELECTRONICS, INFORMATION, COMMUNICATION & TECHNOLOGY (RTEICT-2020), 2020, : 388 - 392
[6] Cursive Text Recognition in Natural Scene Images Using Deep Convolutional Recurrent Neural Network
Chandio, Asghar Ali
Asikuzzaman, MD.
Pickering, Mark R.
Leghari, Mehwish
[J]. IEEE ACCESS, 2022, 10 : 10062 - 10078
[7] Recognition of genetic mutations in text using Deep Learning
Matos, Pedro
Matos, Sergio
[J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE, E-LEARNING AND INFORMATION SYSTEMS 2018 (DATA'18), 2018,
[8] A SURVEY ON VIDEO FACE RECOGNITION USING DEEP LEARNING
Mustapha, Muhammad Firdaus
Mohamad, Nur Maisarah
Hamid, Siti Haslini A. B.
Malik, Mohd Azry Abdul
Noor, Mohd Rahimie M. D.
[J]. JOURNAL OF QUALITY MEASUREMENT AND ANALYSIS, 2022, 18 (01): : 49 - 62
[9] Deep Learning for Activity Recognition Using Audio and Video
Reinolds, Francisco
Neto, Cristiana
Machado, Jose
[J]. ELECTRONICS, 2022, 11 (05)
[10] A framework for improved video text detection and recognition
Haojin Yang
Bernhard Quehl
Harald Sack
[J]. Multimedia Tools and Applications, 2014, 69 : 217 - 245

← 1 2 3 4 5 →