Multi-script line identification from Indian documents

被引:0
|
作者
Pal, U [1 ]
Sinha, S [1 ]
Chaudhuri, BB [1 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 700108, India
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A document page may contain two or more different scripts. For Optical Character Recognition (OCR) of such a document page, it is necessary to separate different scripts before feeding them to their individual OCR system. In this paper an automatic scheme is presented to identify text lines of different Indian scripts from a document. For the separation task at first the scripts are grouped into a few classes according to script characteristics. Next feature based on water reservoir principle, contour tracing, profile etc. are employed to identify them without any expensive OCR-like algorithms. At present, the system has an overall accuracy of about 97.52%.
引用
收藏
页码:880 / 884
页数:5
相关论文
共 50 条
  • [1] Script line separation from Indian multi-script documents
    Pal, U
    Chaudhuri, BB
    [J]. IETE JOURNAL OF RESEARCH, 2003, 49 (01) : 3 - 11
  • [2] HVS inspired system for script identification in Indian multi-script documents
    Pati, PB
    Ramakrishnan, AG
    [J]. DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 380 - 389
  • [3] Script Identification of Multi-Script Documents: A Survey
    Ubul, Kurban
    Tursun, Gulzira
    Aysa, Alimjan
    Impedovo, Donato
    Pirlo, Giuseppe
    Yibulayin, Tuergen
    [J]. IEEE ACCESS, 2017, 5 : 6546 - 6559
  • [4] Identification of different script lines from multi-script documents
    Pal, U
    Chaudhuri, BB
    [J]. IMAGE AND VISION COMPUTING, 2002, 20 (13-14) : 945 - 954
  • [5] Statistical comparison of classifiers for script identification from multi-script handwritten documents
    Singh, Pawan Kumar
    Sarkar, Ram
    Das, Nibaran
    Basu, Subhadip
    Nasipuri, Mita
    [J]. INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2014, 1 (02) : 152 - 172
  • [6] Page-level Script Identification from Multi-script Handwritten Documents
    Singh, Pawan Kumar
    Dalal, Santu Kumar
    Sarkar, Ram
    Nasipuri, Mita
    [J]. 2015 THIRD INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT), 2015,
  • [7] Word-Level Script Identification from Handwritten Multi-script Documents
    Singh, Pawan Kumar
    Mondal, Arafat
    Bhowmik, Showmik
    Sarkar, Ram
    Nasipuri, Mita
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 551 - 558
  • [8] Separating Indic Scripts with 'matra'-A Precursor to Script Identification in Multi-script Documents
    Obaidullah, Sk. Md.
    Goswami, Chitrita
    Santosh, K. C.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2016, VOL 1, 2017, 459 : 205 - 214
  • [9] Automatic separation of words in multi-lingual multi-script Indian documents
    Pal, U
    Chaudhuri, BB
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 576 - 579
  • [10] A blind indic script recognizer for multi-script documents
    Pati, Peeta Basa
    Ramakrishnan, A. G.
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1248 - 1252