Multi-skew detection of Indian script documents

被引:15
|
作者
Pal, U [1 ]
Mitra, M [1 ]
Chaudhuri, BB [1 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 35, W Bengal, India
来源
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDAR.2001.953801
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There tire many documents where text lines are not parallel to each other i.e. these lines have different inclinations with the horizontal lines (mufti-skein documents). For the OCR of such a document we have to estimate the skew angle of individual text lines because a single rotation cannot de-skew all text lines of the document. In this paper, we describe a robust technique for multi-skew angle detection from Indian documents containing the most popular Indian scripts Devnagari and Bangla. Most characters in these scripts have horizontal lines at the top, called headlines. The character head-lines usually connect one another in a word and the word appears as a single component. In the proposed method, the connected components are tit,first labeled and selected. The upper envelopes of selected components tire found by column-wise scanning,from the top of the component. Portions of the zipper envelope satisfying the properties of a digital straight line tire detected. They arc then clustered into groups belonging to single text lines. Estimates from these individual clusters give the skew angle of each text line. The proposed mufti-skein detection technique has an accuracy about 98.3%.
引用
收藏
页码:292 / 296
页数:3
相关论文
共 50 条
  • [31] Wavelet transform for skew angle detection in printed Persian documents
    Dizajyekan, Samira Nasrollahi
    Ebrahimi, Afshin
    THIRD INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2011), 2011, 8009
  • [32] Skew detection and text line position determination in digitized documents
    Democritus Univ of Thrace, Xanthi, Greece
    Pattern Recognit, 9 (1505-1519):
  • [33] Skew detection and text line position determination in digitized documents
    Gatos, B
    Papamarkos, N
    Chamzas, C
    PATTERN RECOGNITION, 1997, 30 (09) : 1505 - 1519
  • [34] Word-Level Thirteen Official Indic Languages Database for Script Identification in Multi-script Documents
    Obaidullah, Sk Md
    Santosh, K. C.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 16 - 27
  • [35] A Texture based approach to Word-level Script Identification from Multi-script Handwritten Documents
    Singh, Pawan Kumar
    Khan, Aparajita
    Sarkar, Ram
    Nasipuri, Mita
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 228 - 232
  • [36] A Rule Based Approach for Skew Correction and Removal of Insignificant Data from Scanned Text Documents of Devanagari Script
    Sharma, Pramod Kumar
    Dhingra, Kapil Dev
    Sanyal, Sudip
    SITIS 2007: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGIES & INTERNET BASED SYSTEMS, 2008, : 899 - +
  • [37] Composite Script Identification and Orientation Detection for Indian Text Images
    Ghosh, Shamita
    Chaudhuri, Bidyut B.
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 294 - 298
  • [38] High-Precision Orientation and Skew Detection for Texts in Scanned Documents
    Boiangiu, Costin-Anton
    Raducanu, Bogdan
    Spataru, Andrei-Cristian
    2009 IEEE 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING, PROCEEDINGS, 2009, : 145 - 148
  • [39] A generalized line segmentation method for multi-script handwritten text documents
    Rakshit, Payel
    Halder, Chayan
    Md Obaidullah, Sk
    Roy, Kaushik
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [40] Geometrical approach to skew detection for documents containing the Latin/Cyrillic characters
    Okun, O
    VISION GEOMETRY VIII, 1999, 3811 : 357 - 365