Touching Character Segmentation Method for Chinese Historical Documents

被引:0
|
作者
Sun, Xiaolu [1 ]
Peng, Liangrui [1 ]
Ding, Xiaoqing [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Elect Engn, Beijing 100084, Peoples R China
来源
关键词
Chinese historical document; character segmentation; touching characters; local dynamic programming;
D O I
10.1117/12.840251
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The OCR technology for Chinese historical documents is still an open problem. As these documents are hand-written or hand-carved in various styles, overlapped and touching characters bring great difficulty for character segmentation module. This paper presents an over-segmentation-based method to handle the overlapped and touching Chinese characters in historic documents. The whole segmentation process includes two parts: over-segmented and segmenting path optimization. In the former part, touching strokes will be found and segmented by analyzing the geometric information of the white and black connected components. The segmentation cost of the touching strokes is estimated with connected components' shape and location, as well as the touching stroke width. The latter part uses local optimization dynamic programming to find best segmenting path. HMM is used to express the multiple choices of segmenting paths, and Viterbi algorithm is used to search local optimal solution. Experimental results on practical Chinese documents show the proposed method is effective.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] An HMM-based Over-segmentation Method for Touching Chinese Handwriting Recognition
    Xu, Liang
    Fan, Wei
    Sun, Jun
    Naoi, Satoshi
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 343 - 348
  • [32] An improved character segmentation algorithm based on local adaptive thresholding technique for Chinese NvShu documents
    Sun, Yangguang
    Cai, Zhihua
    Journal of Networks, 2014, 9 (06) : 1496 - 1501
  • [33] SEGMENTATION OF TOUCHING CHARACTER PRINTED LANNA SCRIPT USING JUNCTION POINT
    Kosarat, Rujipan
    Hiransakolwong, Nualsawat
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2018, 13 (10) : 3331 - 3343
  • [34] A segmentation algorithm for touching character based on the invariant moments and profile feature
    Chang, Junming
    Tang, Wei
    Li, Xiangyu
    Han, Hai
    2012 INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND COMMUNICATION TECHNOLOGY (ICCECT 2012), 2012, : 188 - 191
  • [35] A recognition-based method for segmentation of Chinese character in images and videos
    Yang, Wuyi
    Zhang, Shuwu
    Zheng, Haibo
    Zeng, Zhi
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 723 - 728
  • [36] A two-stage character segmentation method for Chinese license plate
    Tian, Jiangmin
    Wang, Ran
    Wang, Guoyou
    Liu, Jianguo
    Xia, Yuanchun
    COMPUTERS & ELECTRICAL ENGINEERING, 2015, 46 : 539 - 553
  • [37] Historical Chinese Character Recognition Method Based on Style Transfer Mapping
    Li, Bohan
    Peng, Liangrui
    Ji, Jingning
    2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 96 - 100
  • [38] A segmentation method for touching handwritten Japanese characters
    Nishimura, H
    Ikeda, H
    Nakano, Y
    DOCUMENT ANALYSIS SYSTEMS: THEORY AND PRACTICE, 1999, 1655 : 130 - 139
  • [39] Text Line segmentation of historical Arabic documents
    Zahour, Abderrazak
    Likforman-Sulem, Laurence
    Boussalaa, Wafa
    Taconet, Bruno
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 138 - +
  • [40] Text line segmentation of historical documents: a survey
    Likforman-Sulem, Laurence
    Zahour, Abderrazak
    Taconet, Bruno
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 9 (2-4) : 123 - 138