Devanagari Text Recognition: A Transcription Based Formulation

被引:4
|
作者
Sankaran, Naveen [1 ]
Neelappa, Aman [1 ]
Jawahar, C. V. [1 ]
机构
[1] Int Inst Informat Technol, Hyderabad, Andhra Pradesh, India
关键词
D O I
10.1109/ICDAR.2013.139
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optical Character Recognition (OCR) problems are often formulated as isolated character (symbol) classification task followed by a post-classification stage (which contains modules like Unicode generation, error correction etc.) to generate the textual representation, for most of the Indian scripts. Such approaches are prone to failures due to (i) difficulties in designing reliable word-to-symbol segmentation module that can robustly work in presence of degraded (cut/fused) images and (ii) converting the outputs of the classifiers to a valid sequence of Unicodes. In this paper, we propose a formulation, where the expectations on these two modules is minimized, and the harder recognition task is modelled as learning of an appropriate sequence to sequence translation scheme. We thus formulate the recognition as a direct transcription problem. Given many examples of feature sequences and their corresponding Unicode representations, our objective is to learn a mapping which can convert a word directly into a Unicode sequence. This formulation has multiple practical advantages: (i) This reduces the number of classes significantly for the Indian scripts. (ii) It removes the need for a reliable word-to-symbol segmentation. (ii) It does not require strong annotation of symbols to design the classifiers, and (iii) It directly generates a valid sequence of Unicodes. We test our method on more than 6000 pages of printed Devanagari documents from multiple sources. Our method consistently outperforms other state of the art implementations.
引用
下载
收藏
页码:678 / 682
页数:5
相关论文
共 50 条
  • [31] Devanagari Character Recognition: A Comprehensive Literature Review
    Arora, Sandhya
    Malik, Latesh
    Goyal, Sonakshi
    Bhattacharjee, Debotosh
    Nasipuri, Mita
    Krejcar, Ondrej
    IEEE Access, 2025, 13 : 1249 - 1284
  • [32] Devanagari Text Embedding in a gray image: An offbeat approach
    Manisha, M.
    Malvika, S. S.
    Karthikeyan, B.
    Vaithiyanathan, V.
    Srinivasan, S.
    2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1284 - 1288
  • [33] Multistage Recognition Approach for Handwritten Devanagari Script Recognition.
    Rahul, Pawar Vijaya
    Gaikwad, Arun Natha
    PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 651 - 656
  • [34] Devanagari Text Extraction from Natural Scene Images
    Raj, Hrishav
    Ghosh, Rajib
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 513 - 517
  • [35] Contextual information based segmentation and recognition of upper modifiers from Devanagari script
    Gaikwad S.
    Nalbalwar S.
    Nandgaonkar A.
    International Journal of Information Technology, 2023, 15 (8) : 4063 - 4072
  • [36] Feature selection based classifier combination approach for handwritten Devanagari numeral recognition
    Singh P.
    Verma A.
    Chaudhari N.
    Sadhana, 2015, 40 (6) : 1701 - 1714
  • [37] Feature selection based classifier combination approach for handwritten Devanagari numeral recognition
    Singh, Pratibha
    Verma, Ajay
    Chaudhari, Narendra S.
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2015, 40 (06): : 1701 - 1714
  • [38] ROLE OF CONTEXT IN DEVANAGARI SCRIPT RECOGNITION.
    Sinha, R.M.K.
    IETE Journal of Research, 1987, 33 (03) : 86 - 91
  • [39] An Efficient Approach for Handwritten Devanagari Character Recognition based on Artificial Neural Network
    Singh, Nikita
    2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 894 - 897
  • [40] MACHINE RECOGNITION OF CONSTRAINED HAND PRINTED DEVANAGARI
    SETHI, IK
    CHATTERJEE, B
    PATTERN RECOGNITION, 1977, 9 (02) : 69 - 75