Recognition of cursive video text using a deep learning framework

被引:6
|
作者
Mirza, Ali [1 ]
Siddiqi, Imran [1 ]
机构
[1] Bahria Univ, Dept Comp Sci, Islamabad, Pakistan
关键词
optical character recognition; feature extraction; recurrent neural nets; video signal processing; information retrieval; content-based retrieval; learning (artificial intelligence); text analysis; image segmentation; video retrieval; video frames; News channel videos; character recognition rate; Urdu text; cursive scripts; cursive video text; deep learning; textual content-based retrieval system; text regions; video optical character recognition systems; video text recognition; mature V-OCRs; noncursive scripts; complex ligatures; overlapping ligatures; context-dependent shape variations; cursive caption text; convolutional networks; end-to-end framework; convolutional neural network; feature sequence extraction; bi-directional recurrent neural networks; sequence-to-sequence mapping; text lines extraction; background segmentation; TRAFFIC SIGNS RECOGNITION; CHARACTER-RECOGNITION; SCENE; REPRESENTATION; SEGMENTATION; STROKELETS; FEATURES;
D O I
10.1049/iet-ipr.2019.1070
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study focuses on cursive text recognition appearing in videos, using a complete framework of deep neural networks. While mature video optical character recognition systems (V-OCRs) are available for text in non-cursive scripts, recognition of cursive scripts is marked by many challenges. These include complex and overlapping ligatures, context-dependent shape variations and presence of a large number of dots and diacritics. The authors present an analytical technique for recognition of cursive caption text that relies on a combination of convolutional and recurrent neural networks trained in an end-to-end framework. Text lines extracted from video frames are preprocessed to segment the background and are fed to a convolutional neural network for feature extraction. The extracted feature sequences are fed to different variants of bi-directional recurrent neural networks along with the ground truth transcription to learn sequence-to-sequence mapping. Finally, a connectionist temporal classification layer is employed to produce the final transcription. Experiments on a data set of more than 40,000 text lines from 11,192 video frames of various News channel videos reported an overall character recognition rate of 97.63%. The proposed work employs Urdu text as a case study but the findings can be generalised to other cursive scripts as well.
引用
收藏
页码:3444 / 3455
页数:12
相关论文
共 50 条
  • [1] Detection and recognition of cursive text from video frames
    Ali Mirza
    Ossama Zeshan
    Muhammad Atif
    Imran Siddiqi
    [J]. EURASIP Journal on Image and Video Processing, 2020
  • [2] Detection and recognition of cursive text from video frames
    Mirza, Ali
    Zeshan, Ossama
    Atif, Muhammad
    Siddiqi, Imran
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
  • [3] Impact of Pre-Processing on Recognition of Cursive Video Text
    Mirza, Ali
    Siddiqi, Imran
    Mustufa, Syed Ghulam
    Hussain, Mazahir
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PT I, 2020, 11867 : 565 - 576
  • [4] Doctor's Cursive Handwriting Recognition System Using Deep Learning
    Fajardo, Lovely Joy
    Sorillo, Nino Joshua
    Garlit, Jaycel
    Tomines, Cia Dennise
    Abisado, Mideth B.
    Imperial, Joseph Marvin R.
    Rodriguez, Ramon L.
    Fabito, Bernie S.
    [J]. 2019 IEEE 11TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT, AND MANAGEMENT (HNICEM), 2019,
  • [5] Handwritten Text Recognition using Deep Learning
    Nikitha, A.
    Geetha, J.
    JayaLakshmi, D. S.
    [J]. 2020 5TH IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS ON ELECTRONICS, INFORMATION, COMMUNICATION & TECHNOLOGY (RTEICT-2020), 2020, : 388 - 392
  • [6] Cursive Text Recognition in Natural Scene Images Using Deep Convolutional Recurrent Neural Network
    Chandio, Asghar Ali
    Asikuzzaman, MD.
    Pickering, Mark R.
    Leghari, Mehwish
    [J]. IEEE ACCESS, 2022, 10 : 10062 - 10078
  • [7] Recognition of genetic mutations in text using Deep Learning
    Matos, Pedro
    Matos, Sergio
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE, E-LEARNING AND INFORMATION SYSTEMS 2018 (DATA'18), 2018,
  • [8] A SURVEY ON VIDEO FACE RECOGNITION USING DEEP LEARNING
    Mustapha, Muhammad Firdaus
    Mohamad, Nur Maisarah
    Hamid, Siti Haslini A. B.
    Malik, Mohd Azry Abdul
    Noor, Mohd Rahimie M. D.
    [J]. JOURNAL OF QUALITY MEASUREMENT AND ANALYSIS, 2022, 18 (01): : 49 - 62
  • [9] Deep Learning for Activity Recognition Using Audio and Video
    Reinolds, Francisco
    Neto, Cristiana
    Machado, Jose
    [J]. ELECTRONICS, 2022, 11 (05)
  • [10] A framework for improved video text detection and recognition
    Haojin Yang
    Bernhard Quehl
    Harald Sack
    [J]. Multimedia Tools and Applications, 2014, 69 : 217 - 245