Multi-skew detection of Indian script documents

被引:15
|
作者
Pal, U [1 ]
Mitra, M [1 ]
Chaudhuri, BB [1 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 35, W Bengal, India
关键词
D O I
10.1109/ICDAR.2001.953801
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There tire many documents where text lines are not parallel to each other i.e. these lines have different inclinations with the horizontal lines (mufti-skein documents). For the OCR of such a document we have to estimate the skew angle of individual text lines because a single rotation cannot de-skew all text lines of the document. In this paper, we describe a robust technique for multi-skew angle detection from Indian documents containing the most popular Indian scripts Devnagari and Bangla. Most characters in these scripts have horizontal lines at the top, called headlines. The character head-lines usually connect one another in a word and the word appears as a single component. In the proposed method, the connected components are tit,first labeled and selected. The upper envelopes of selected components tire found by column-wise scanning,from the top of the component. Portions of the zipper envelope satisfying the properties of a digital straight line tire detected. They arc then clustered into groups belonging to single text lines. Estimates from these individual clusters give the skew angle of each text line. The proposed mufti-skein detection technique has an accuracy about 98.3%.
引用
收藏
页码:292 / 296
页数:3
相关论文
共 50 条
  • [1] Skew angle detection of digitized Indian script documents
    Chaudhuri, BB
    Pal, U
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (02) : 182 - 186
  • [2] Script line separation from Indian multi-script documents
    Pal, U
    Chaudhuri, BB
    IETE JOURNAL OF RESEARCH, 2003, 49 (01) : 3 - 11
  • [3] HVS inspired system for script identification in Indian multi-script documents
    Pati, PB
    Ramakrishnan, AG
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 380 - 389
  • [4] Multi-script line identification from Indian documents
    Pal, U
    Sinha, S
    Chaudhuri, BB
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 880 - 884
  • [5] Script identification from Indian documents
    Joshi, GD
    Carg, S
    Sivaswamy, J
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 255 - 267
  • [6] Automatic separation of words in multi-lingual multi-script Indian documents
    Pal, U
    Chaudhuri, BB
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 576 - 579
  • [7] Script Identification of Multi-Script Documents: A Survey
    Ubul, Kurban
    Tursun, Gulzira
    Aysa, Alimjan
    Impedovo, Donato
    Pirlo, Giuseppe
    Yibulayin, Tuergen
    IEEE ACCESS, 2017, 5 : 6546 - 6559
  • [8] A blind indic script recognizer for multi-script documents
    Pati, Peeta Basa
    Ramakrishnan, A. G.
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1248 - 1252
  • [9] Script Independent Detection of Bold Words in Multi Font-size Documents
    Saikrishna, Pedamalli
    Ramakrishnan, A. G.
    2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,
  • [10] An Approach to Skew Detection of Printed Documents
    Brodic, Darko
    Mello, Carlos A. B.
    Maluckov, Cedomir A.
    Milivojevic, Zoran N.
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2014, 20 (04) : 488 - 506