High Performance Offline Handwritten Chinese Text Recognition with a New Data Preprocessing and Augmentation Pipeline

被引:13
|
作者
Xie, Canyu [1 ]
Lai, Songxuan [1 ]
Liao, Qianying [1 ]
Jin, Lianwen [1 ,2 ]
机构
[1] South China Univ Technol, Coll Elect & Informat Engn, Guangzhou, Peoples R China
[2] SCUT, Zhuhai Inst Modern Ind Innovat, Zhuhai 519000, Peoples R China
来源
DOCUMENT ANALYSIS SYSTEMS | 2020年 / 12116卷
关键词
Offline Handwritten Text Recognition (HCTR); Data preprocessing; Data augmentation; CNN-ResLSTM; NEURAL-NETWORK; SEQUENCE; ONLINE; MODEL;
D O I
10.1007/978-3-030-57058-3_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline handwritten text recognition (HCTR) has been a long-standing research topic. To build robust and high-performance offline HCTR systems, it is natural to develop data preprocessing and augmentation techniques, which, however, have not been fully explored. In this paper, we propose a data preprocessing and augmentation pipeline and a CNN-ResLSTM model for high-performance offline HCTR. The data preprocessing and augmentation pipeline consists of three steps: training text sample generation, text sample preprocessing and text sample synthesis. The CNN-ResLSTM model is derived by introducing residual connections into the RNN part of the CRNN architecture. Experiments show that on the proposed CNN-ResLSTM, the data preprocessing and augmentation pipeline can effectively and robustly improve the system performance: On two standard benchmarks, namely the CASIA-HWDB and the ICDAR-2013 handwriting competition dataset, the proposed approach achieves state-of-the-art results with correct rates of 97.28% and 96.99%, respectively. Furthermore, to make our model more practical, we employ model acceleration and compression techniques to build a fast and compact model without sacrificing the accuracy.
引用
收藏
页码:45 / 59
页数:15
相关论文
共 50 条
  • [31] Tree-based data augmentation and mutual learning for offline handwritten mathematical expression recognition
    Yang, Chen
    Du, Jun
    Zhang, Jianshu
    Wu, Changjie
    Chen, Mingjun
    Wu, JiaJia
    PATTERN RECOGNITION, 2022, 132
  • [32] BRESSAY: A Brazilian Portuguese Dataset for Offline Handwritten Text Recognition
    Neto, Arthur F. S.
    Bezerra, Byron L. D.
    Araujo, Savio S.
    Souza, Wiliane M. A. S.
    Alves, Kleberson F.
    Oliveira, Macileide F.
    Lins, Samara V. S.
    Hazin, Hugo J. F.
    Rocha, Pedro H., V
    Toselli, Alejandro H.
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 315 - 333
  • [33] Offline recognition of syntax-constrained cursive handwritten text
    González, J
    Salvador, I
    Toselli, AH
    Juan, A
    Vidal, E
    Casacuberta, F
    ADVANCES IN PATTERN RECOGNITION, 2000, 1876 : 143 - 153
  • [34] Multiple classifier methods for offline handwritten text line recognition
    Bertolami, Roman
    Bunke, Horst
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2007, 4472 : 72 - +
  • [35] Offline Handwritten Text Recognition Using Support Vector Machines
    Rajnoha, Martin
    Burget, Radim
    Dutta, Malay Kishore
    2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 132 - 136
  • [36] Offline Handwritten Text Recognition Based on CTC-Attention
    Ma Yangyang
    Xiao Bing
    LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (12)
  • [37] Bridging the Gap in Resource for Offline English Handwritten Text Recognition
    Mondal, Ajoy
    Tulsyan, Krishna
    Jawahai, C., V
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 413 - 428
  • [38] Offline handwritten Chinese character recognition via radical extraction and recognition
    Ip, WWS
    Chung, KFL
    Yeung, DS
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 185 - 189
  • [39] Improving Handwritten Arabic Text Recognition Using an Adaptive Data-Augmentation Algorithm
    Eltay, Mohamed
    Zidouri, Abdelmalek
    Ahmad, Irfan
    Elarian, Yousef
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021 WORKSHOPS, PT I, 2021, 12916 : 322 - 335
  • [40] A Residual-Attention Offline Handwritten Chinese Text Recognition Based on Fully Convolutional Neural Networks
    Wang, Yintong
    Yang, Yingjie
    Ding, Weiping
    Li, Shuo
    IEEE ACCESS, 2021, 9 : 132301 - 132310