Script line separation from Indian multi-script documents

被引：16

作者：

Pal, U ^{[1
]}

Chaudhuri, BB ^{[1
]}

机构：

[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 700108, India

来源：

IETE JOURNAL OF RESEARCH | 2003年 / 49卷 / 01期

关键词：

optical character recognition (OCR); document processing; Indian scripts and languages; multi-lingual and multi-script documents;

D O I：

10.1080/03772063.2003.11416318

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In a multi-lingual country like India, a document page may contain more than one script form. Under the three-language formula, the document may be printed in English, Devnagari and one of the other Indian official languages. For Optical Character Recognition (OCR) of such a document page, it is necessary to separate these three script forms before feeding them to the OCRs of individual scripts. In this paper an automatic technique of separating the text lines is presented for almost all triplet of script forms. To do so, the triplets are grouped into five classes according to their characteristics, and shape based features have been employed to separate them without any expensive OCR-like algorithms. The proposed approaches are tested-on many documents and the experimental results are presented. At present, the system has an overall accuracy of about 98.5%.

引用

页码：3 / 11

页数：9

共 50 条

[31] MULTI-SCRIPT MODIFICATION OF MEDICATION LIST OF POMR
SLOCUM, H
CAPUT, WG
JOURNAL OF FAMILY PRACTICE, 1977, 5 (01): : 131 - 133
[32] Multi-script Iterative Steerable Directional Filtering For Handwritten Text Line Extraction
Swaileh, Wassim
Mohand, Kamel Ait
Paquet, Thierry
2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1241 - 1245
[33] Identification of different scripts lines from multi-script documents (vol 20, pg 945, 2002)
Pal, U
Chaudhuri, BB
IMAGE AND VISION COMPUTING, 2003, 21 (11) : 1017 - 1017
[34] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
Ferrer, Miguel A.
Das, Abhijit
Diaz, Moises
Morales, Aythami
Carmona-Duarte, Cristina
Pal, Umapada
arXiv,
[35] Multi-script Writer Identification Optimized With Retrieval Mechanism
Djeddi, Chawki
Siddiqi, Imran
Souici-Meslati, Labiba
Ennaji, Abdellatif
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 509 - 514
[36] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
Ferrer, Miguel A.
Das, Abhijit
Diaz, Moises
Morales, Aythami
Carmona-Duarte, Cristina
Pal, Umapada
COGNITIVE COMPUTATION, 2024, 16 (01) : 131 - 157
[37] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
Miguel A. Ferrer
Abhijit Das
Moises Diaz
Aythami Morales
Cristina Carmona-Duarte
Umapada Pal
Cognitive Computation, 2024, 16 (1) : 131 - 157
[38] Feature learning and encoding for multi-script writer identification
Abdelillah Semma
Yaâcoub Hannad
Imran Siddiqi
Said Lazrak
Mohamed El Youssfi El Kettani
International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 79 - 93
[39] Feature learning and encoding for multi-script writer identification
Semma, Abdelillah
Hannad, Yaacoub
Siddiqi, Imran
Lazrak, Said
El Kettani, Mohamed El Youssfi
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (02) : 79 - 93
[40] ICFHR 2018 Competition on Multi-Script Writer Identification
Djeddi, Chawki
Al-Maadeed, Somaya
Siddiqi, Imran
Gattal, Abdeljalil
He, Sheng
Akbari, Younes
PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 506 - 510

← 1 2 3 4 5 →