Effective Printed Tamil Text Segmentation and Recognition Using Bayesian Classifier

被引:1
|
作者
Manisha, S. [1 ]
Sharmila, T. Sree [2 ]
机构
[1] SSN Coll Engn, Dept CSE, Madras, Tamil Nadu, India
[2] SSN Coll Engn, Dept IT, Madras, Tamil Nadu, India
关键词
Binarization; Bounding box; Character recognition; Classification; Dilation; Segmentation; Tamil text detection; CHARACTER-RECOGNITION;
D O I
10.1007/978-981-10-3874-7_69
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text segmentation and recognition of Indian languages have gained a lot of research interest in the recent years. The existence of a huge number of symbols and varying characteristics in these languages makes segmentation and extraction of text a challenging task. The Tamil language has a wide variety of the literature, and printed text is available in various forms such as newspaper, books, and magazines. In this paper, extraction of printed Tamil text from an image is done irrespective of the characteristics of the text such as font style, color, and size. The proposed work uses scanned printed Tamil text as the input image. This input image is binarized since text is always available in the foreground, and histograms can be used to segment them into lines and words. The morphological operator, dilation, is used to remove outliers such as dots and commas present in an underlying object and segment the printed text into words to facilitate text detection. Further, each character is identified using bounding box technique. Classification of Tamil letters is done by extracting features such as gradient information and curvature-based information obtained from grayscale and binary images. These features are trained, and characters are classified using Bayesian classifier. The recognized characters are documented as text using Unicode format. The performance of the approach is evaluated using precision, recall, and F-measure.
引用
收藏
页码:729 / 738
页数:10
相关论文
共 50 条
  • [31] Recognition of Tamil handwritten text from a digital writing pad using MWDCNN
    Jayanthi, V.
    Thenmalar, S.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 30261 - 30276
  • [32] Recognition of Tamil handwritten text from a digital writing pad using MWDCNN
    V. Jayanthi
    S. Thenmalar
    [J]. Multimedia Tools and Applications, 2024, 83 : 30261 - 30276
  • [33] Recognition of Printed Devanagari Text Using BLSTM Neural Network
    Sankaran, Naveen
    Jawahar, C. V.
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 322 - 325
  • [34] Printed Text Recognition using BLSTM and MDLSTM for Indian languages
    Chavan, Vishal
    Malage, Abhijit
    Mehrotra, Kapil
    Gupta, Manish Kumar
    [J]. 2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 345 - 350
  • [35] Named entity recognition and classification in biomedical text using classifier ensemble
    Saha, Sriparna
    Ekbal, Asif
    Sikdar, Utpal Kumar
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 11 (04) : 365 - 391
  • [36] Newspaper text recognition of Gurumukhi script using random forest classifier
    Kaur, Rupinder Pal
    Kumar, Munish
    Jindal, M. K.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (11-12) : 7435 - 7448
  • [37] Newspaper text recognition of Gurumukhi script using random forest classifier
    Rupinder Pal Kaur
    Munish Kumar
    M. K. Jindal
    [J]. Multimedia Tools and Applications, 2020, 79 : 7435 - 7448
  • [38] Segmentation of Persian/arabic printed text using ink spread effect
    Shirali-Shahreza, Sajad
    Manzuri-Shalmani, M. T.
    Shirali-Shahreza, M. Hassan
    [J]. 2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 3997 - 4000
  • [39] Effective recognition of MCCs in mammograms using an improved neural classifier
    Ren, Jinchang
    Wang, Dong
    Jiang, Jianmin
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2011, 24 (04) : 638 - 645
  • [40] Using a boosted tree classifier for text segmentation in hand-annotated documents
    Peng, Xujun
    Setlur, Srirangaraj
    Govindaraju, Venu
    Ramachandrula, Sitaram
    [J]. PATTERN RECOGNITION LETTERS, 2012, 33 (07) : 943 - 950