Multi-skew detection of Indian script documents

被引:15
|
作者
Pal, U [1 ]
Mitra, M [1 ]
Chaudhuri, BB [1 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 35, W Bengal, India
来源
SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDAR.2001.953801
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There tire many documents where text lines are not parallel to each other i.e. these lines have different inclinations with the horizontal lines (mufti-skein documents). For the OCR of such a document we have to estimate the skew angle of individual text lines because a single rotation cannot de-skew all text lines of the document. In this paper, we describe a robust technique for multi-skew angle detection from Indian documents containing the most popular Indian scripts Devnagari and Bangla. Most characters in these scripts have horizontal lines at the top, called headlines. The character head-lines usually connect one another in a word and the word appears as a single component. In the proposed method, the connected components are tit,first labeled and selected. The upper envelopes of selected components tire found by column-wise scanning,from the top of the component. Portions of the zipper envelope satisfying the properties of a digital straight line tire detected. They arc then clustered into groups belonging to single text lines. Estimates from these individual clusters give the skew angle of each text line. The proposed mufti-skein detection technique has an accuracy about 98.3%.
引用
收藏
页码:292 / 296
页数:3
相关论文
共 50 条
  • [41] A ROBUST SYSTEM FOR THRESHOLDING AND SKEW DETECTION IN MIXED TEXT/GRAPHICS DOCUMENTS
    Amin, Adnan
    Wu, Sue
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2005, 5 (02) : 247 - 265
  • [42] Improved Shape Code Based Word Matching For Multi-script Documents
    Mondal, Tanmoy
    Tarafdar, Arundhati
    Ragot, Nicolas
    Ramel, Jean-Yves
    Pal, Umapada
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 181 - 185
  • [43] Script Identification for Printed and Handwritten Indian Documents: An Empirical Study of Different Feature Classifier Combinations
    Rani, Rajneesh
    Dhir, Renu
    Kakkar, Deepti
    Sharma, Nonita
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2022, 22 (03)
  • [44] Skew Detection and Correction of Mushaf Al-Quran Script using Hough Transform
    Bafjaish, Salem Saleh
    Azmi, Mohd Sanusi
    Al-Mhiqani, Mohammed Nasser
    Radzid, Amirul Ramzani
    Mahdin, Hairulnizam
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (08) : 402 - 409
  • [45] Stop Word Detection in Compressed Textual Images: an Experiment on Indic Script Documents
    Garain, Utpal
    Das, Amit Kumar
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1889 - +
  • [46] Local skew correction in documents
    Saragiotis, P.
    Papamarkos, N.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2008, 22 (04) : 691 - 710
  • [47] Multi-level Skew Correction Approach for Hand Written Kannada Documents
    Vinod, H. C.
    Niranjan, S. K.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY & SYSTEMS (ICITS 2018), 2018, 721 : 376 - 386
  • [48] Script identification in printed bilingual documents
    Dhanya, D
    Ramakrishnan, AG
    DOCUMENT ANALYSIS SYSTEM V, PROCEEDINGS, 2002, 2423 : 13 - 24
  • [49] An approach to the script discrimination in the Slavic documents
    Brodic, Darko
    Milivojevic, Zoran N.
    Maluckov, Cedomir A.
    SOFT COMPUTING, 2015, 19 (09) : 2655 - 2665
  • [50] Script identification in printed bilingual documents
    D. Dhanya
    A. G. Ramakrishnan
    Peeta Basa Pati
    Sadhana, 2002, 27 : 73 - 82