Effective Printed Tamil Text Segmentation and Recognition Using Bayesian Classifier

被引:1
|
作者
Manisha, S. [1 ]
Sharmila, T. Sree [2 ]
机构
[1] SSN Coll Engn, Dept CSE, Madras, Tamil Nadu, India
[2] SSN Coll Engn, Dept IT, Madras, Tamil Nadu, India
关键词
Binarization; Bounding box; Character recognition; Classification; Dilation; Segmentation; Tamil text detection; CHARACTER-RECOGNITION;
D O I
10.1007/978-981-10-3874-7_69
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text segmentation and recognition of Indian languages have gained a lot of research interest in the recent years. The existence of a huge number of symbols and varying characteristics in these languages makes segmentation and extraction of text a challenging task. The Tamil language has a wide variety of the literature, and printed text is available in various forms such as newspaper, books, and magazines. In this paper, extraction of printed Tamil text from an image is done irrespective of the characteristics of the text such as font style, color, and size. The proposed work uses scanned printed Tamil text as the input image. This input image is binarized since text is always available in the foreground, and histograms can be used to segment them into lines and words. The morphological operator, dilation, is used to remove outliers such as dots and commas present in an underlying object and segment the printed text into words to facilitate text detection. Further, each character is identified using bounding box technique. Classification of Tamil letters is done by extracting features such as gradient information and curvature-based information obtained from grayscale and binary images. These features are trained, and characters are classified using Bayesian classifier. The recognized characters are documented as text using Unicode format. The performance of the approach is evaluated using precision, recall, and F-measure.
引用
收藏
页码:729 / 738
页数:10
相关论文
共 50 条
  • [1] Optical Character Recognition for printed Tamil text using Unicode
    Seethalakshmi R.
    Sreeranjani T.R.
    Balachandar T.
    Singh A.
    Singh M.
    Ratan R.
    Kumar S.
    [J]. Journal of Zhejiang University-SCIENCE A, 2005, 6 (11): : 1297 - 1305
  • [2] Optical Character Recognition for printed Tamil text using Unicode
    SEETHALAKSHMI R.
    SREERANJANI T.R.
    BALACHANDAR T.
    Abnikant Singh
    Markandey Singh
    Ritwaj Ratan
    Sarvesh Kumar
    [J]. Journal of Zhejiang University-Science A(Applied Physics & Engineering), 2005, (11) : 131 - 139
  • [3] An effective feature set for enhancing printed Tamil character recognition
    Shafana, M. S.
    Ragel, R. G.
    Kumara, T. N.
    [J]. JOURNAL OF THE NATIONAL SCIENCE FOUNDATION OF SRI LANKA, 2021, 49 (02): : 195 - 208
  • [4] Efficient Recognition of Machine Printed Arabic Text Using Partial Segmentation and Hausdorff Distance
    Saabni, Raid
    [J]. 2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 284 - 289
  • [5] A Novel Tamil Character Recognition Using Decision Tree Classifier
    Raja, Selvakumar
    John, Mala
    [J]. IETE JOURNAL OF RESEARCH, 2013, 59 (05) : 569 - 575
  • [6] Hierarchical OCR for Printed Tamil Text
    Noordeen, Aarif
    Kannan, Kawshik
    Ravi, Harish
    Venkatraman, Bhaskar
    Milton, R. S.
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2018), 2019, 11041
  • [7] Wavelet Face Recognition Using Bayesian classifier
    Niu, LiPing
    Yan, ShiTao
    [J]. ADVANCED BUILDING MATERIALS AND STRUCTURAL ENGINEERING, 2012, 461 : 561 - +
  • [8] Bayesian classifier for multi-oriented video text recognition system
    Roy, Sangheeta
    Shivakumara, Palaiahnakote
    Roy, Partha Pratim
    Pal, Umapada
    Tan, Chew Lim
    Lu, Tong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) : 5554 - 5566
  • [9] Recognition of printed and handwritten Tamil characters using fuzzy approach
    Suresh, RM
    Ganesan, L
    [J]. ICCIMA 2005: Sixth International Conference on Computational Intelligence and Multimedia Applications, Proceedings, 2005, : 291 - 296
  • [10] Printed and Handwritten Tamil characters recognition using fuzzy technique
    Suresh, R. M.
    [J]. IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 702 - 706