Touching Character Segmentation Method for Chinese Historical Documents

被引:0
|
作者
Sun, Xiaolu [1 ]
Peng, Liangrui [1 ]
Ding, Xiaoqing [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Elect Engn, Beijing 100084, Peoples R China
来源
关键词
Chinese historical document; character segmentation; touching characters; local dynamic programming;
D O I
10.1117/12.840251
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The OCR technology for Chinese historical documents is still an open problem. As these documents are hand-written or hand-carved in various styles, overlapped and touching characters bring great difficulty for character segmentation module. This paper presents an over-segmentation-based method to handle the overlapped and touching Chinese characters in historic documents. The whole segmentation process includes two parts: over-segmented and segmenting path optimization. In the former part, touching strokes will be found and segmented by analyzing the geometric information of the white and black connected components. The segmentation cost of the touching strokes is estimated with connected components' shape and location, as well as the touching stroke width. The latter part uses local optimization dynamic programming to find best segmenting path. HMM is used to express the multiple choices of segmenting paths, and Viterbi algorithm is used to search local optimal solution. Experimental results on practical Chinese documents show the proposed method is effective.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Local Projection based Character Segmentation Method for Historical Chinese Documents
    Yang, Linjie
    Peng, Liangrui
    DOCUMENT RECOGNITION AND RETRIEVAL XX, 2013, 8658
  • [2] A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm
    Zhao, Quanchao
    Ma, Long-long
    Duan, Lijuan
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT IV, 2018, 11259 : 309 - 321
  • [3] HRRegionNet: Chinese Character Segmentation in Historical Documents with Regional Awareness
    Tang, Chia-Wei
    Liu, Chao-Lin
    Chiu, Po-Sen
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 3 - 17
  • [4] HRCenterNet: An Anchorless Approach to Chinese Character Segmentation in Historical Documents
    Tang, Chia-Wei
    Liu, Chao-Lin
    Po-Sen Chiu
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1924 - 1930
  • [5] Chinese handwritten character segmentation in form documents
    Chen, JL
    Wu, CH
    Lee, HJ
    DOCUMENT ANALYSIS SYSTEMS: THEORY AND PRACTICE, 1999, 1655 : 348 - 362
  • [6] Touching Character Segmentation Method of Archaic Lanna Script
    Pravesjit, Sakkayaphop
    Thammano, Arit
    E-BUSINESS AND TELECOMMUNICATIONS, 2012, 314 : 400 - 408
  • [7] Character Segmentation for Classical Mongolian Words in Historical Documents
    Su, Xiangdong
    Gao, Guanglai
    Wang, Weihua
    Bao, Feilong
    Wei, Hongxi
    PATTERN RECOGNITION (CCPR 2014), PT II, 2014, 484 : 464 - 473
  • [8] Graph Model Optimization based Historical Chinese Character Segmentation Method
    Ji, Jingning
    Peng, Liangrui
    Li, Bohan
    2014 11TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS 2014), 2014, : 282 - 286
  • [9] A Touching Character Database from Chinese Handwriting for Assessing Segmentation Algorithms
    Xu, Liang
    Yin, Fei
    Wang, Qiu-Feng
    Liu, Cheng-Lin
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 89 - 94
  • [10] A Sequence Labeling Based Approach for Character Segmentation of Historical Documents
    Gao, Liangcai
    Zhang, Xiaode
    Tang, Zhi
    Huang, Yaoxiong
    Jin, Lianwen
    2018 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS (DAS), 2018, : 305 - 310