MSdocTr-Lite: A lite transformer for full page multi-script handwriting recognition

被引:6
|
作者
Dhiaf, Marwa [1 ,2 ,3 ]
Rouhou, Ahmed Cheikh [1 ]
Kessentini, Yousri [2 ,3 ]
Ben Salem, Sinda [1 ]
机构
[1] InstaDeep, Tunis, Tunisia
[2] Digital Res Ctr Sfax, Sfax 3021, Tunisia
[3] SMRTS Lab Signals Syst aRtificial Intelligence & n, Sfax, Tunisia
关键词
Seq2Seq model; Page-level recognition; Handwritten text recognition; Multi-script; Transformer; Transfer learning;
D O I
10.1016/j.patrec.2023.03.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Transformer has quickly become the dominant architecture for various pattern recognition tasks due to its capacity for long-range representation. However, transformers are data-hungry models and need large datasets for training. In Handwritten Text Recognition (HTR), collecting a massive amount of labeled data is a complicated and expensive task. In this paper, we propose a lite transformer architecture for full-page multi-script handwriting recognition. The proposed model comes with three advantages: First, to solve the common problem of data scarcity, we propose a lite transformer model that can be trained on a reasonable amount of data, which is the case of most HTR public datasets, without the need for external data. Second, it can learn the reading order at page-level thanks to a curriculum learning strategy, allowing it to avoid line segmentation errors, exploit a larger context and reduce the need for costly segmentation annotations. Third, it can be easily adapted to other scripts by applying a simple transferlearning process using only page-level labeled images. Extensive experiments on different datasets with different scripts (French, English, Spanish, and Arabic) show the effectiveness of the proposed model. (c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:28 / 34
页数:7
相关论文
共 18 条
  • [1] Multi-script handwriting recognition with FOHDEL
    Malaviya, A
    Leja, C
    Peters, L
    1996 BIENNIAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1996, : 147 - 151
  • [2] Multi-Script Handwriting Recognition with N-Streams Low Level Features
    Kessentini, Yousri
    Paquet, Thierry
    Benhamadou, AbdelMajid
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2639 - 2642
  • [3] Page-level Script Identification from Multi-script Handwritten Documents
    Singh, Pawan Kumar
    Dalal, Santu Kumar
    Sarkar, Ram
    Nasipuri, Mita
    2015 THIRD INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT), 2015,
  • [4] LAMIS-MSHD: A Multi-Script offline Handwriting Database
    Djeddi, Chawki
    Siddiqi, Imran
    Gattal, Abdeljalil
    Chibani, Youcef
    Souici-Meslati, Labiba
    El Abed, Haikal
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 93 - 97
  • [5] Multi-script handwritten digit recognition using multi-task learning
    Gondere, Mesay Samuel
    Schmidt-Thieme, Lars
    Sharma, Durga Prasad
    Scholz, Randolf
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (01) : 355 - 364
  • [6] Full Page Handwriting Recognition via Image to Sequence Extraction
    Singh, Sumeet S.
    Karayev, Sergey
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 55 - 69
  • [7] Recognition of Numeric Postal Codes from Multi-script Postal Address Blocks
    Basu, Subhadip
    Das, Nibaran
    Sarkar, Ram
    Kundu, Mahantapas
    Nasipuri, Mita
    Basu, Dipak Kumar
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 381 - 386
  • [8] Text-independent writer recognition using multi-script handwritten texts
    Djeddi, Chawki
    Siddiqi, Imran
    Souici-Meslati, Labiba
    Ennaji, Abdellatif
    PATTERN RECOGNITION LETTERS, 2013, 34 (10) : 1196 - 1202
  • [9] Gender classification from offline multi-script handwriting images using oriented Basic Image Features (oBIFs)
    Gattal, Abdeljalil
    Djeddi, Chawki
    Siddiqi, Imran
    Chibani, Youcef
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 99 : 155 - 167
  • [10] Start, Follow, Read: End-to-End Full-Page Handwriting Recognition
    Wigington, Curtis
    Tensmeyer, Chris
    Davis, Brian
    Barrett, William
    Price, Brian
    Cohen, Scott
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 372 - 388