Devanagari Text Recognition: A Transcription Based Formulation

被引:4
|
作者
Sankaran, Naveen [1 ]
Neelappa, Aman [1 ]
Jawahar, C. V. [1 ]
机构
[1] Int Inst Informat Technol, Hyderabad, Andhra Pradesh, India
关键词
D O I
10.1109/ICDAR.2013.139
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optical Character Recognition (OCR) problems are often formulated as isolated character (symbol) classification task followed by a post-classification stage (which contains modules like Unicode generation, error correction etc.) to generate the textual representation, for most of the Indian scripts. Such approaches are prone to failures due to (i) difficulties in designing reliable word-to-symbol segmentation module that can robustly work in presence of degraded (cut/fused) images and (ii) converting the outputs of the classifiers to a valid sequence of Unicodes. In this paper, we propose a formulation, where the expectations on these two modules is minimized, and the harder recognition task is modelled as learning of an appropriate sequence to sequence translation scheme. We thus formulate the recognition as a direct transcription problem. Given many examples of feature sequences and their corresponding Unicode representations, our objective is to learn a mapping which can convert a word directly into a Unicode sequence. This formulation has multiple practical advantages: (i) This reduces the number of classes significantly for the Indian scripts. (ii) It removes the need for a reliable word-to-symbol segmentation. (ii) It does not require strong annotation of symbols to design the classifiers, and (iii) It directly generates a valid sequence of Unicodes. We test our method on more than 6000 pages of printed Devanagari documents from multiple sources. Our method consistently outperforms other state of the art implementations.
引用
下载
收藏
页码:678 / 682
页数:5
相关论文
共 50 条
  • [41] Recognition of Devanagari characters using neural networks
    Keeni, K
    Shimodaira, H
    Nishino, T
    Tan, Y
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (05) : 523 - 528
  • [42] Robust pattern recognition scheme for Devanagari script
    Dhurandhar, A
    Shankarnarayanan, K
    Jawale, R
    COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 1021 - 1026
  • [43] Recognition of unconstrained on-line Devanagari characters
    Connell, SD
    Sinha, RMK
    Jain, AK
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 368 - 371
  • [44] On the Performance Improvement of Devanagari Handwritten Character Recognition
    Singh, Pratibha
    Verma, Ajay
    Chaudhari, Narendra S.
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2015, 2015
  • [45] Handwritten devanagari manuscript characters recognition using capsnet
    Moudgil A.
    Singh S.
    Gautam V.
    Rani S.
    Shah S.H.
    International Journal of Cognitive Computing in Engineering, 2023, 4 : 47 - 54
  • [46] Handwritten Devanagari Character Recognition using Wavelet Based Feature Extraction and Classification Scheme
    Dixit, Adwait
    Navghane, Ashwini
    Dandawate, Yogesh
    2014 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2014,
  • [47] Study of Two Zone-based Features for Online Bengali and Devanagari Character Recognition
    Ghosh, Rajib
    Roy, Partha Pratim
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 401 - 405
  • [48] Ensembling: Model of histogram of oriented gradient based handwritten devanagari character recognition system
    Deore, S. P.
    Pravin, A.
    TRAITEMENT DU SIGNAL, 2017, 34 (1-2) : 7 - 20
  • [49] Radial Basis Function For Handwritten Devanagari Numeral Recognition
    Singh, Prerna
    Tyagi, Nidhi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2011, 2 (05) : 126 - 129
  • [50] A survey on optical character recognition for Bangla and Devanagari scripts
    SOUMEN BAG
    GAURAV HARIT
    Sadhana, 2013, 38 : 133 - 168