Touching Character Segmentation Method for Chinese Historical Documents

被引:0
|
作者
Sun, Xiaolu [1 ]
Peng, Liangrui [1 ]
Ding, Xiaoqing [1 ]
机构
[1] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Dept Elect Engn, Beijing 100084, Peoples R China
来源
关键词
Chinese historical document; character segmentation; touching characters; local dynamic programming;
D O I
10.1117/12.840251
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The OCR technology for Chinese historical documents is still an open problem. As these documents are hand-written or hand-carved in various styles, overlapped and touching characters bring great difficulty for character segmentation module. This paper presents an over-segmentation-based method to handle the overlapped and touching Chinese characters in historic documents. The whole segmentation process includes two parts: over-segmented and segmenting path optimization. In the former part, touching strokes will be found and segmented by analyzing the geometric information of the white and black connected components. The segmentation cost of the touching strokes is estimated with connected components' shape and location, as well as the touching stroke width. The latter part uses local optimization dynamic programming to find best segmenting path. HMM is used to express the multiple choices of segmenting paths, and Viterbi algorithm is used to search local optimal solution. Experimental results on practical Chinese documents show the proposed method is effective.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Text line segmentation of historical documents: a survey
    Laurence Likforman-Sulem
    Abderrazak Zahour
    Bruno Taconet
    International Journal of Document Analysis and Recognition (IJDAR), 2007, 9 : 123 - 138
  • [42] Color and Hyperspectral Image Segmentation for Historical Documents
    Ciortan, Irina
    Deborah, Hilda
    George, Sony
    Hardeberg, Jon Y.
    2015 DIGITAL HERITAGE INTERNATIONAL CONGRESS, VOL 1: DIGITIZATION & ACQUISITION, COMPUTER GRAPHICS & INTERACTION, 2015, : 199 - 205
  • [43] Chinese Word Segmentation with Character Abstraction
    Tian, Le
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 36 - 43
  • [44] CHARACTER PROTOTYPE SELECTION FOR HANDWRITING RECOGNITION IN HISTORICAL DOCUMENTS
    Fischer, Andreas
    Bunke, Horst
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1435 - 1439
  • [45] A Complete Optical Character Recognition Methodology for Historical Documents
    Vamvakas, G.
    Gatos, B.
    Stamatopoulos, N.
    Perantonis, S. J.
    PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 525 - 532
  • [46] A graph-based approach for segmenting touching lines in historical handwritten documents
    Fernandez-Mota, David
    Llados, Josep
    Fornes, Alicia
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2014, 17 (03) : 293 - 312
  • [47] The defining character of Chinese historical thinking
    Huang, Chun-Chieh
    HISTORY AND THEORY, 2007, 46 (02) : 180 - 188
  • [48] A graph-based approach for segmenting touching lines in historical handwritten documents
    David Fernández-Mota
    Josep Lladós
    Alicia Fornés
    International Journal on Document Analysis and Recognition (IJDAR), 2014, 17 : 293 - 312
  • [49] A Character Segmentation Method without Character Verification
    Qi, Wenfa
    Li, Xiaolong
    Yang, Bin
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION WORKSHOP: IITA 2008 WORKSHOPS, PROCEEDINGS, 2008, : 581 - 584
  • [50] A Text-Line Segmentation Method for Historical Tibetan Documents Based on Baseline Detection
    Li, Yanxing
    Ma, Longlong
    Duan, Lijuan
    Wu, Jian
    COMPUTER VISION, PT I, 2017, 771 : 356 - 367