Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering

被引:0
|
作者
Kusetogullari, Huseyin [1 ]
机构
[1] Blekinge Inst Technol, Dept Comp Sci & Engn, S-37141 Karlskrona, Sweden
关键词
Handwritten text binarization; Image processing; k-means clustering; Document images; IMAGE BINARIZATION; ENHANCEMENT; ALGORITHM;
D O I
10.1007/978-3-319-56991-8_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel technique for unsupervised text binarization in handwritten historical documents using k-means clustering. In the text binarization problem, there are many challenges such as noise, faint characters and bleed-through and it is necessary to overcome these tasks to increase the correct detection rate. To overcome these problems, preprocessing strategy is first used to enhance the contrast to improve faint characters and Gaussian Mixture Model (GMM) is used to ignore the noise and other artifacts in the handwritten historical documents. After that, the enhanced image is normalized which will be used in the postprocessing part of the proposed method. The handwritten binarization image is achieved by partitioning the normalized pixel values of the handwritten image into two clusters using k-means clustering with k = 2 and then assigning each normalized pixel to the one of the two clusters by using the minimum Euclidean distance between the normalized pixels intensity and mean normalized pixel value of the clusters. Experimental results verify the effectiveness of the proposed approach.
引用
收藏
页码:23 / 32
页数:10
相关论文
共 50 条
  • [21] An improved preconditioned unsupervised K-means clustering algorithm
    Sun, Tiantian
    Peng, Xiaofei
    Ge, Wenxiu
    Xu, Weiwei
    COMPUTATIONAL STATISTICS, 2025,
  • [22] Text Enhancement for Historical Handwritten Documents
    Alaasam, Reem
    Madi, Boraq
    El-Sana, Jihad
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 397 - 412
  • [23] Non-uniform Illumination Document Image Binarization Using K-Means Clustering Algorithm
    Yang, Xingxin
    Wan, Yi
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2021), 2021, : 555 - 559
  • [24] A HYBRID APPROACH USING PSO AND K-MEANS FOR SEMANTIC CLUSTERING OF WEB DOCUMENTS
    Avanija, J.
    Ramar, K.
    JOURNAL OF WEB ENGINEERING, 2013, 12 (3-4): : 249 - 264
  • [25] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [26] Chinese text clustering algorithm based k-means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 301 - 307
  • [27] Text Document Clustering Based on Density K-means
    Wu, Di
    Zeng, Yan
    Qu, Yin-chuan
    INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,
  • [28] Chinese Text Clustering Algorithm Based K-Means
    Yao, Mingyu
    Pi, Dechang
    Cong, Xiangxiang
    2011 AASRI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRY APPLICATION (AASRI-AIIA 2011), VOL 1, 2011, : 90 - 93
  • [29] Weighted k-Means Algorithm Based Text Clustering
    Chen, Xiuguo
    Yin, Wensheng
    Tu, Pinghui
    Zhang, Hengxi
    IEEC 2009: FIRST INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE, PROCEEDINGS, 2009, : 51 - +
  • [30] Improved K-Means algorithm in text semantic clustering
    Ma, Junhong
    Open Cybernetics and Systemics Journal, 2014, 8 : 530 - 534