Segmenting Characters from Malayalam Handwritten Documents

被引:0
|
作者
Hashrin, C. P. [1 ]
Jossy, Amal [1 ]
Sudhakaran, K. [1 ]
Thushara, A. [1 ]
John, Ansamma [1 ]
机构
[1] TKM Coll Engn, Dept Comp Sci & Engn, Kollam, Kerala, India
关键词
OCR; segmentation; RECOGNITION;
D O I
10.1109/iciict1.2019.8741416
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Construction of an Optical Character Recognition (OCR) model for handwritten documents poses many challenges, the most prominent of them being dataset collection, character segmentation and classification. This paper focuses on the segmentation part, and presents a novel approach to segment individual characters from Malayalam handwritten documents. It is a three-stage approach where morphological operations, contour analysis, and bounding box detection are used to extract individual lines from the document, words from each line, and then characters from each word. An additional masking method is performed to tackle the overlapping of bounding boxes due to skewed lines and the presence of diacritics. The segmented characters can either be used to create datasets or fed to OCR models.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Segmenting handwritten Chinese characters based on heuristic merging of stroke bounding boxes and dynamic programming
    Tseng, LY
    Chen, RC
    PATTERN RECOGNITION LETTERS, 1998, 19 (10) : 963 - 973
  • [22] Comparison of shape-based and stroke-based methods for segmenting handwritten chinese characters
    Yang, J
    Zhang, H
    Dencler, M
    Lu, C
    Fourth Annual ACIS International Conference on Computer and Information Science, Proceedings, 2005, : 114 - 119
  • [23] Language Identification from Handwritten Documents
    Mioulet, Luc
    Garain, Utpal
    Chatelain, Clement
    Barlas, Philippine
    Paquet, Thierry
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 676 - 680
  • [24] A Method of Recognizing Handwritten Characters in Japanese Historical Documents by Using Feature Graphs
    Nakata, Mitsuru
    Nishida, Shuichi
    Fukuda, Ryuzo
    Ge, Qi-Wei
    Yoshimura, Makoto
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2010, 13 (3B): : 953 - 966
  • [25] An optimized CNN system to recognize handwritten characters in ancient documents in Grantha script
    Jindal A.
    Ghosh R.
    International Journal of Information Technology, 2023, 15 (4) : 1975 - 1983
  • [26] A Framework for Generating Extractive Summary from Multiple Malayalam Documents
    Manju, K.
    David Peter, S.
    Idicula, Sumam Mary
    INFORMATION, 2021, 12 (01) : 1 - 16
  • [27] AN ALGORITHM FOR SEGMENTING HANDWRITTEN POSTAL CODES
    CESAR, M
    SHINGHAL, R
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1990, 33 (01): : 63 - 80
  • [28] Segmentation of touching Arabic characters in Handwritten documents by overlapping set theory and contour tracing
    Ullah I.
    Azmi M.S.
    Desa M.I.
    Alomari Y.M.
    International Journal of Advanced Computer Science and Applications, 2019, 10 (05): : 155 - 160
  • [29] Segmentation of Touching Arabic Characters in Handwritten Documents by Overlapping Set Theory and Contour Tracing
    Ullah, Inam
    Azmi, Mohd Sanusi
    Desa, Mohamad Ishak
    Alomari, Yazan M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (05) : 155 - 160
  • [30] A SURVEY ON CHARACTER RECOGNITION FROM HANDWRITTEN DOCUMENTS
    Kaur, Gagandeep
    Singh, Varinder
    Chawla, Sunil Kumar
    Bhasin, Mahima
    ADVANCES AND APPLICATIONS IN MATHEMATICAL SCIENCES, 2020, 19 (05): : 321 - 331