Multi-skew detection of Indian script documents

被引：15

作者：

Pal, U ^{[1
]}

Mitra, M ^{[1
]}

Chaudhuri, BB ^{[1
]}

机构：

[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 35, W Bengal, India

来源：

SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS | 2001年

关键词：

D O I：

10.1109/ICDAR.2001.953801

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There tire many documents where text lines are not parallel to each other i.e. these lines have different inclinations with the horizontal lines (mufti-skein documents). For the OCR of such a document we have to estimate the skew angle of individual text lines because a single rotation cannot de-skew all text lines of the document. In this paper, we describe a robust technique for multi-skew angle detection from Indian documents containing the most popular Indian scripts Devnagari and Bangla. Most characters in these scripts have horizontal lines at the top, called headlines. The character head-lines usually connect one another in a word and the word appears as a single component. In the proposed method, the connected components are tit,first labeled and selected. The upper envelopes of selected components tire found by column-wise scanning,from the top of the component. Portions of the zipper envelope satisfying the properties of a digital straight line tire detected. They arc then clustered into groups belonging to single text lines. Estimates from these individual clusters give the skew angle of each text line. The proposed mufti-skein detection technique has an accuracy about 98.3%.

引用

页码：292 / 296

页数：3

共 50 条

[1] Skew angle detection of digitized Indian script documents
Chaudhuri, BB
Pal, U
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (02) : 182 - 186
[2] Script line separation from Indian multi-script documents
Pal, U
Chaudhuri, BB
IETE JOURNAL OF RESEARCH, 2003, 49 (01) : 3 - 11
[3] HVS inspired system for script identification in Indian multi-script documents
Pati, PB
Ramakrishnan, AG
DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 380 - 389
[4] Multi-script line identification from Indian documents
Pal, U
Sinha, S
Chaudhuri, BB
SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 880 - 884
[5] Script identification from Indian documents
Joshi, GD
Carg, S
Sivaswamy, J
DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 255 - 267
[6] Automatic separation of words in multi-lingual multi-script Indian documents
Pal, U
Chaudhuri, BB
PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 576 - 579
[7] Script Identification of Multi-Script Documents: A Survey
Ubul, Kurban
Tursun, Gulzira
Aysa, Alimjan
Impedovo, Donato
Pirlo, Giuseppe
Yibulayin, Tuergen
IEEE ACCESS, 2017, 5 : 6546 - 6559
[8] A blind indic script recognizer for multi-script documents
Pati, Peeta Basa
Ramakrishnan, A. G.
ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1248 - 1252
[9] Script Independent Detection of Bold Words in Multi Font-size Documents
Saikrishna, Pedamalli
Ramakrishnan, A. G.
2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,
[10] An Approach to Skew Detection of Printed Documents
Brodic, Darko
Mello, Carlos A. B.
Maluckov, Cedomir A.
Milivojevic, Zoran N.
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2014, 20 (04) : 488 - 506

← 1 2 3 4 5 →