High Performance Offline Handwritten Chinese Text Recognition with a New Data Preprocessing and Augmentation Pipeline

被引:13
|
作者
Xie, Canyu [1 ]
Lai, Songxuan [1 ]
Liao, Qianying [1 ]
Jin, Lianwen [1 ,2 ]
机构
[1] South China Univ Technol, Coll Elect & Informat Engn, Guangzhou, Peoples R China
[2] SCUT, Zhuhai Inst Modern Ind Innovat, Zhuhai 519000, Peoples R China
来源
DOCUMENT ANALYSIS SYSTEMS | 2020年 / 12116卷
关键词
Offline Handwritten Text Recognition (HCTR); Data preprocessing; Data augmentation; CNN-ResLSTM; NEURAL-NETWORK; SEQUENCE; ONLINE; MODEL;
D O I
10.1007/978-3-030-57058-3_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline handwritten text recognition (HCTR) has been a long-standing research topic. To build robust and high-performance offline HCTR systems, it is natural to develop data preprocessing and augmentation techniques, which, however, have not been fully explored. In this paper, we propose a data preprocessing and augmentation pipeline and a CNN-ResLSTM model for high-performance offline HCTR. The data preprocessing and augmentation pipeline consists of three steps: training text sample generation, text sample preprocessing and text sample synthesis. The CNN-ResLSTM model is derived by introducing residual connections into the RNN part of the CRNN architecture. Experiments show that on the proposed CNN-ResLSTM, the data preprocessing and augmentation pipeline can effectively and robustly improve the system performance: On two standard benchmarks, namely the CASIA-HWDB and the ICDAR-2013 handwriting competition dataset, the proposed approach achieves state-of-the-art results with correct rates of 97.28% and 96.99%, respectively. Furthermore, to make our model more practical, we employ model acceleration and compression techniques to build a fast and compact model without sacrificing the accuracy.
引用
收藏
页码:45 / 59
页数:15
相关论文
共 50 条
  • [41] Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition
    Wang, Zi-Rui
    Du, Jun
    Wang, Jia-Ming
    PATTERN RECOGNITION, 2020, 100
  • [42] A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition
    Zi-Rui Wang
    Jun Du
    Wen-Chao Wang
    Jian-Fang Zhai
    Jin-Shui Hu
    International Journal on Document Analysis and Recognition (IJDAR), 2018, 21 : 241 - 251
  • [43] A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition
    Wang, Zi-Rui
    Du, Jun
    Wang, Wen-Chao
    Zhai, Jian-Fang
    Hu, Jin-Shui
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2018, 21 (04) : 241 - 251
  • [44] Deep Convolutional Neural Network Based Hidden Markov Model for Offline Handwritten Chinese Text Recognition
    Wang, Zi-Rui
    Du, Jun
    Hu, Jin-Shui
    Hu, Yu-Long
    PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, : 816 - 821
  • [45] Generative adversarial network based adaptive data augmentation for handwritten Arabic text recognition
    Eltay, Mohamed
    Zidouri, Abdelmalek
    Ahmad, Irfan
    Elarian, Yousef
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [46] Handwritten Chinese text editing and recognition system
    Zhou, Shusen
    Chen, Qingcai
    Wang, Xiaolong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 71 (03) : 1363 - 1380
  • [47] Handwritten Chinese text editing and recognition system
    Shusen Zhou
    Qingcai Chen
    Xiaolong Wang
    Multimedia Tools and Applications, 2014, 71 : 1363 - 1380
  • [48] Improving Offline Handwritten Chinese Character Recognition by Iterative Refinement
    Yang, Xiao
    He, Dafang
    Zhou, Zihan
    Kifer, Daniel
    Giles, C. Lee
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 5 - 10
  • [49] Multiresolution recognition of offline handwritten Chinese characters with wavelet transform
    Huang, L
    Huang, X
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 631 - 634
  • [50] Post Processing for Offline Chinese Handwritten Character String Recognition
    Wang, YanWei
    Ding, XiaoQing
    Liu, ChangSong
    DOCUMENT RECOGNITION AND RETRIEVAL XIX, 2012, 8297