Script line separation from Indian multi-script documents

被引：16

作者：

Pal, U ^{[1
]}

Chaudhuri, BB ^{[1
]}

机构：

[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 700108, India

来源：

IETE JOURNAL OF RESEARCH | 2003年 / 49卷 / 01期

关键词：

optical character recognition (OCR); document processing; Indian scripts and languages; multi-lingual and multi-script documents;

D O I：

10.1080/03772063.2003.11416318

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In a multi-lingual country like India, a document page may contain more than one script form. Under the three-language formula, the document may be printed in English, Devnagari and one of the other Indian official languages. For Optical Character Recognition (OCR) of such a document page, it is necessary to separate these three script forms before feeding them to the OCRs of individual scripts. In this paper an automatic technique of separating the text lines is presented for almost all triplet of script forms. To do so, the triplets are grouped into five classes according to their characteristics, and shape based features have been employed to separate them without any expensive OCR-like algorithms. The proposed approaches are tested-on many documents and the experimental results are presented. At present, the system has an overall accuracy of about 98.5%.

引用

页码：3 / 11

页数：9

共 50 条

[21] Multi-script handwriting recognition with FOHDEL
Malaviya, A
Leja, C
Peters, L
1996 BIENNIAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1996, : 147 - 151
[22] Multi-script versus single-script scenarios in automatic off-line signature verification
Das, Abhijit
Ferrer, Miguel A.
Pal, Umapada
Pal, Srikanta
Diaz, Moises
Blumenstein, Michael
IET BIOMETRICS, 2016, 5 (04) : 305 - 313
[23] Handwritten Indic Script Identification in Multi-Script Document Images: A Survey
Obaidullah, Sk Md
Santosh, K. C.
Das, Nibaran
Halder, Chayan
Roy, Kaushik
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (10)
[24] Multi-script Text Detection from Images: A Survey
Dadiya, Nidhi J.
Goswami, Mukesh M.
2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
[25] A multilingual multi-script database of Indian theses: Implementation of unicode at Vidyanidhi
Urs, SR
Harinarayana, NS
Kumbar, M
DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS, 2002, 2555 : 305 - 314
[26] Multi-script Text Extraction from Natural Scenes
Gomez, Lluis
Karatzas, Dimosthenis
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 467 - 471
[27] Word level multi-script identification
Pati, Peeta Basa
Ramakrishnan, A. G.
PATTERN RECOGNITION LETTERS, 2008, 29 (09) : 1218 - 1229
[28] Multi-skew detection of Indian script documents
Pal, U
Mitra, M
Chaudhuri, BB
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 292 - 296
[29] A novel framework for automatic sorting of postal documents with multi-script address blocks
Basu, Subhadip
Das, Nibaran
Sarkar, Ram
Kundu, Mahantapas
Nasipuri, Mita
Basu, Dipak Kumar
PATTERN RECOGNITION, 2010, 43 (10) : 3507 - 3521
[30] Multi-script Writer Identification using Dissimilarity
Bertolini, Diego
Oliveira, Luiz S.
Sabourin, Robert
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3025 - 3030

← 1 2 3 4 5 →