TEXT LINE DETECTION IN MULTICOLUMN FOR INDIAN SCRIPTS USING HISTOGRAM: A DOCUMENT IMAGE ANALYSIS APPLICATION

被引:0
|
作者
Kumar, Umesh [1 ]
Raheja, Jagdish [1 ]
机构
[1] CSIR, CEERI, Digital Syst Grp, Pilani 333031, Rajasthan, India
关键词
OCRed Document; Indian Scripts; OCR; Text Lines Detection and Presentation;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There arc more than 1000 languages and 14 scripts used by 112 million people in India. All of these scripts divide the document in three parts: Text block, Image block, and Table block. In 21(st) century, there is a need, by obvious reasons, to convert these old printed documents in digital form. Converting them manually is a huge and difficult task. Further it is prone to human errors. Another automated technique is to use Optical character recognition (OCR) system to convert the entire printed document image into editable document. In this paper, an effort has been made to develop OCR technique which converts the printed document into editable document. Firstly a scanned document is preprocessed for noise and skew correction. It is then followed by text-non text classification. Then text line detection has to be performed in the text area. There is no method available which can detect the text line if the image contains the multicolumn text area. In this paper the main contribution is to detect the blocks and detect the text lines in these detected blocks. The technique which can extract the text lines in image document is presented here. After extraction of text lines, word segmentation, character segmentation, and template matching can be performed.
引用
收藏
页码:161 / 168
页数:8
相关论文
共 50 条
  • [1] Document Image Dewarping Based on Text Line Detection and Surface Modeling
    Shamgholi, M.
    Khosravi, H.
    Riazi, S. M.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2014, 27 (12): : 1855 - 1862
  • [2] State Estimation in a Document Image and Its Application in Text Block Identification and Text Line Extraction
    Koo, Hyung Il
    Cho, Nam Ik
    COMPUTER VISION-ECCV 2010, PT II, 2010, 6312 : 421 - +
  • [3] Automated Text line Segmentation and Table detection for Pre-Printed Document Image Analysis Systems
    Rani, N. Shobha
    Pruthvi, T. R.
    Rao, Aishwarya Govinda
    Bipin, Nair B. J.
    ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, : 723 - 730
  • [4] Restoration Method of Distorted Digital Document Image Based on Text Line Detection
    Shen, Chong
    Tong, Lijing
    Zhan, Jian
    Zhang, Zaiyin
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 768 - 771
  • [5] Speedy Character Line Detection Algorithm Using Image Block-Based Histogram Analysis
    Premachandra, Chinthaka
    Goto, Katsunari
    Tsuruoka, Shinji
    Kawanaka, Hiroharu
    Takase, Haruhiko
    IMAGE ANALYSIS AND RECOGNITION (ICIAR 2015), 2015, 9164 : 481 - 488
  • [6] Syntactic and semantic labeling of hierarchically organized document image components of Indian Scripts
    Harit, Gaurav
    Garg, Ritu
    Chaudhury, Santanu
    ICAPR 2009: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, PROCEEDINGS, 2009, : 314 - 317
  • [7] Text detection of two major Indian scripts in natural scene images
    Department of Information Technology, Heritage Institute of Technology, Kolkata, India
    不详
    Lect. Notes Comput. Sci., (42-57):
  • [8] Text identification for document image analysis using a neural network
    Strouthopoulos, C
    Papamarkos, N
    IMAGE AND VISION COMPUTING, 1998, 16 (12-13) : 879 - 896
  • [9] Experimental application of a Japanese historical document image synthesis method to text line segmentation
    Inuzuka, Naoto
    Suzuki, Tetsuya
    ICPRAM 2021 - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods, 2021, : 628 - 634
  • [10] Experimental Application of a Japanese Historical Document Image Synthesis Method to Text Line Segmentation
    Inuzuka, Naoto
    Suzuki, Tetsuya
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 628 - 634