Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering

被引:0
|
作者
Kusetogullari, Huseyin [1 ]
机构
[1] Blekinge Inst Technol, Dept Comp Sci & Engn, S-37141 Karlskrona, Sweden
关键词
Handwritten text binarization; Image processing; k-means clustering; Document images; IMAGE BINARIZATION; ENHANCEMENT; ALGORITHM;
D O I
10.1007/978-3-319-56991-8_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel technique for unsupervised text binarization in handwritten historical documents using k-means clustering. In the text binarization problem, there are many challenges such as noise, faint characters and bleed-through and it is necessary to overcome these tasks to increase the correct detection rate. To overcome these problems, preprocessing strategy is first used to enhance the contrast to improve faint characters and Gaussian Mixture Model (GMM) is used to ignore the noise and other artifacts in the handwritten historical documents. After that, the enhanced image is normalized which will be used in the postprocessing part of the proposed method. The handwritten binarization image is achieved by partitioning the normalized pixel values of the handwritten image into two clusters using k-means clustering with k = 2 and then assigning each normalized pixel to the one of the two clusters by using the minimum Euclidean distance between the normalized pixels intensity and mean normalized pixel value of the clusters. Experimental results verify the effectiveness of the proposed approach.
引用
收藏
页码:23 / 32
页数:10
相关论文
共 50 条
  • [1] Binarization by Local K-means Clustering for Korean Text Extraction
    Lai, Anh-Nga
    Lee, GueeSang
    ISSPIT: 8TH IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2008, : 117 - 122
  • [2] BINARIZATION OF HISTORICAL DOCUMENTS USING SELF-LEARNING CLASSIFIER BASED ON K-MEANS AND SVM
    Djema, Amina
    Chibani, Youcef
    2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
  • [3] Subspace clustering of text documents with feature weighting K-means algorithm
    Jing, LP
    Ng, MK
    Xu, J
    Huang, JZ
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 802 - 812
  • [4] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
    Sarnovsky, Martin
    Carnoka, Noema
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
  • [5] Unsupervised K-Means Clustering Algorithm
    Sinaga, Kristina P.
    Yang, Miin-Shen
    IEEE ACCESS, 2020, 8 : 80716 - 80727
  • [6] Hybrid Binarization Method for Historical Handwritten Documents
    Asatryan, D. G.
    Haroutunian, M. E.
    Sazhumyan, G. S.
    Kupriyanov, A. V.
    Paringer, R. A.
    Kirsh, D. V.
    PROGRAMMING AND COMPUTER SOFTWARE, 2023, 49 (SUPPL 1) : S45 - S50
  • [7] Handwritten Document Image Binarization: An Adaptive K-Means Based Approach
    Jana, Prithwish
    Ghosh, Soulib
    Bera, Suman Kumar
    Sarkar, Ram
    2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 226 - 230
  • [8] Hybrid Binarization Method for Historical Handwritten Documents
    D. G. Asatryan
    M. E. Haroutunian
    G. S. Sazhumyan
    A. V. Kupriyanov
    R. A. Paringer
    D. V. Kirsh
    Programming and Computer Software, 2023, 49 : S45 - S50
  • [9] Unsupervised Embrace Pose Recognition using K-Means Clustering
    Kleawsirikul, Nutnaree
    Mitake, Hironori
    Hasegawa, Shoichi
    2017 26TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2017, : 883 - 890
  • [10] Handwritten Hindi Character Recognition using K-Means Clustering and SVM
    Gaur, Akanksha
    Yadav, Sunita
    2015 4TH INTERNATIONAL SYMPOSIUM ON EMERGING TRENDS AND TECHNOLOGIES IN LIBRARIES AND INFORMATION SERVICES (ETTLIS), 2015, : 65 - 70