High Performance Offline Handwritten Chinese Text Recognition with a New Data Preprocessing and Augmentation Pipeline

被引:13
|
作者
Xie, Canyu [1 ]
Lai, Songxuan [1 ]
Liao, Qianying [1 ]
Jin, Lianwen [1 ,2 ]
机构
[1] South China Univ Technol, Coll Elect & Informat Engn, Guangzhou, Peoples R China
[2] SCUT, Zhuhai Inst Modern Ind Innovat, Zhuhai 519000, Peoples R China
来源
DOCUMENT ANALYSIS SYSTEMS | 2020年 / 12116卷
关键词
Offline Handwritten Text Recognition (HCTR); Data preprocessing; Data augmentation; CNN-ResLSTM; NEURAL-NETWORK; SEQUENCE; ONLINE; MODEL;
D O I
10.1007/978-3-030-57058-3_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Offline handwritten text recognition (HCTR) has been a long-standing research topic. To build robust and high-performance offline HCTR systems, it is natural to develop data preprocessing and augmentation techniques, which, however, have not been fully explored. In this paper, we propose a data preprocessing and augmentation pipeline and a CNN-ResLSTM model for high-performance offline HCTR. The data preprocessing and augmentation pipeline consists of three steps: training text sample generation, text sample preprocessing and text sample synthesis. The CNN-ResLSTM model is derived by introducing residual connections into the RNN part of the CRNN architecture. Experiments show that on the proposed CNN-ResLSTM, the data preprocessing and augmentation pipeline can effectively and robustly improve the system performance: On two standard benchmarks, namely the CASIA-HWDB and the ICDAR-2013 handwriting competition dataset, the proposed approach achieves state-of-the-art results with correct rates of 97.28% and 96.99%, respectively. Furthermore, to make our model more practical, we employ model acceleration and compression techniques to build a fast and compact model without sacrificing the accuracy.
引用
收藏
页码:45 / 59
页数:15
相关论文
共 50 条
  • [1] Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review
    de Sousa Neto A.F.
    Bezerra B.L.D.
    de Moura G.C.D.
    Toselli A.H.
    SN Computer Science, 5 (2)
  • [2] Parsimonious HMMs for Offline Handwritten Chinese Text Recognition
    Wang, Wenchao
    Du, Jun
    Wang, Zi-Rui
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 145 - 150
  • [3] A Computationally Efficient Pipeline Approach to Full Page Offline Handwritten Text Recognition
    Chung, Jonathan
    Delteil, Thomas
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 35 - 40
  • [4] Offline Recognition of Malayalam Handwritten Text
    Shanjana, C.
    James, Ajay
    8TH INTERNATIONAL CONFERENCE INTERDISCIPLINARITY IN ENGINEERING, INTER-ENG 2014, 2015, 19 : 772 - 779
  • [5] Distilling GRU with Data Augmentation for Unconstrained Handwritten Text Recognition
    Liu, Manfei
    Xie, Zecheng
    Huang, YaoXiong
    Jin, Lianwen
    Zhou, Weiyin
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 56 - 61
  • [6] A high-performance CNN method for offline handwritten Chinese character recognition and visualization
    Pavlo Melnyk
    Zhiqiang You
    Keqin Li
    Soft Computing, 2020, 24 : 7977 - 7987
  • [7] HANA: A handwritten name database for offline handwritten text recognition
    Dahl, Christian M.
    Johansen, Torben S. D.
    Sorensen, Emil N.
    Wittrock, Simon
    EXPLORATIONS IN ECONOMIC HISTORY, 2023, 87
  • [8] A high-performance CNN method for offline handwritten Chinese character recognition and visualization
    Melnyk, Pavlo
    You, Zhiqiang
    Li, Keqin
    SOFT COMPUTING, 2020, 24 (11) : 7977 - 7987
  • [9] Recurrent Neural Network Transducer for Japanese and Chinese Offline Handwritten Text Recognition
    Ngo, Trung Tan
    Nguyen, Hung Tuan
    Ly, Nam Tuan
    Nakagawa, Masaki
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 364 - 376
  • [10] LODENet: A Holistic Approach to Offline Handwritten Chinese and Japanese Text Line Recognition
    Hoang, Huu-Tin
    Peng, Chun-Jen
    Hung Vinh Tran
    Le, Hung
    Huy Hoang Nguyen
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4813 - 4820