Multi-script line identification from Indian documents

被引:0
|
作者
Pal, U [1 ]
Sinha, S [1 ]
Chaudhuri, BB [1 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 700108, India
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A document page may contain two or more different scripts. For Optical Character Recognition (OCR) of such a document page, it is necessary to separate different scripts before feeding them to their individual OCR system. In this paper an automatic scheme is presented to identify text lines of different Indian scripts from a document. For the separation task at first the scripts are grouped into a few classes according to script characteristics. Next feature based on water reservoir principle, contour tracing, profile etc. are employed to identify them without any expensive OCR-like algorithms. At present, the system has an overall accuracy of about 97.52%.
引用
收藏
页码:880 / 884
页数:5
相关论文
共 50 条
  • [31] Word-wise script identification from Indian documents
    Sinha, S
    Pal, U
    Chaudhuri, BB
    DOCUMENT ANALYSIS SYSTEMS VI, PROCEEDINGS, 2004, 3163 : 310 - 321
  • [32] A multilingual multi-script database of Indian theses: Implementation of unicode at Vidyanidhi
    Urs, SR
    Harinarayana, NS
    Kumbar, M
    DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS, 2002, 2555 : 305 - 314
  • [33] A Study on Word-Level Multi-script Identification from Video Frames
    Sharma, Nabin
    Pal, Umapada
    Blumenstein, Michael
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 1827 - 1833
  • [34] A novel framework for automatic sorting of postal documents with multi-script address blocks
    Basu, Subhadip
    Das, Nibaran
    Sarkar, Ram
    Kundu, Mahantapas
    Nasipuri, Mita
    Basu, Dipak Kumar
    PATTERN RECOGNITION, 2010, 43 (10) : 3507 - 3521
  • [35] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
    Ferrer, Miguel A.
    Das, Abhijit
    Diaz, Moises
    Morales, Aythami
    Carmona-Duarte, Cristina
    Pal, Umapada
    arXiv,
  • [36] Multi-script Text Detection from Images: A Survey
    Dadiya, Nidhi J.
    Goswami, Mukesh M.
    2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
  • [37] Multi-script Text Extraction from Natural Scenes
    Gomez, Lluis
    Karatzas, Dimosthenis
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 467 - 471
  • [38] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
    Ferrer, Miguel A.
    Das, Abhijit
    Diaz, Moises
    Morales, Aythami
    Carmona-Duarte, Cristina
    Pal, Umapada
    COGNITIVE COMPUTATION, 2024, 16 (01) : 131 - 157
  • [39] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
    Miguel A. Ferrer
    Abhijit Das
    Moises Diaz
    Aythami Morales
    Cristina Carmona-Duarte
    Umapada Pal
    Cognitive Computation, 2024, 16 (1) : 131 - 157
  • [40] Multi-script handwriting recognition with FOHDEL
    Malaviya, A
    Leja, C
    Peters, L
    1996 BIENNIAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1996, : 147 - 151