MSdocTr-Lite: A lite transformer for full page multi-script handwriting recognition

被引：6

作者：

Dhiaf, Marwa ^{[1
,2
,3
]}

Rouhou, Ahmed Cheikh ^{[1
]}

Kessentini, Yousri ^{[2
,3
]}

Ben Salem, Sinda ^{[1
]}

机构：

[1] InstaDeep, Tunis, Tunisia

[2] Digital Res Ctr Sfax, Sfax 3021, Tunisia

[3] SMRTS Lab Signals Syst aRtificial Intelligence & n, Sfax, Tunisia

来源：

PATTERN RECOGNITION LETTERS | 2023年 / 169卷

关键词：

Seq2Seq model; Page-level recognition; Handwritten text recognition; Multi-script; Transformer; Transfer learning;

D O I：

10.1016/j.patrec.2023.03.020

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Transformer has quickly become the dominant architecture for various pattern recognition tasks due to its capacity for long-range representation. However, transformers are data-hungry models and need large datasets for training. In Handwritten Text Recognition (HTR), collecting a massive amount of labeled data is a complicated and expensive task. In this paper, we propose a lite transformer architecture for full-page multi-script handwriting recognition. The proposed model comes with three advantages: First, to solve the common problem of data scarcity, we propose a lite transformer model that can be trained on a reasonable amount of data, which is the case of most HTR public datasets, without the need for external data. Second, it can learn the reading order at page-level thanks to a curriculum learning strategy, allowing it to avoid line segmentation errors, exploit a larger context and reduce the need for costly segmentation annotations. Third, it can be easily adapted to other scripts by applying a simple transferlearning process using only page-level labeled images. Extensive experiments on different datasets with different scripts (French, English, Spanish, and Arabic) show the effectiveness of the proposed model. (c) 2023 Elsevier B.V. All rights reserved.

引用

页码：28 / 34

页数：7

共 18 条

[1] Multi-script handwriting recognition with FOHDEL
Malaviya, A
Leja, C
Peters, L
1996 BIENNIAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1996, : 147 - 151
[2] Multi-Script Handwriting Recognition with N-Streams Low Level Features
Kessentini, Yousri
Paquet, Thierry
Benhamadou, AbdelMajid
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2639 - 2642
[3] Page-level Script Identification from Multi-script Handwritten Documents
Singh, Pawan Kumar
Dalal, Santu Kumar
Sarkar, Ram
Nasipuri, Mita
2015 THIRD INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT), 2015,
[4] LAMIS-MSHD: A Multi-Script offline Handwriting Database
Djeddi, Chawki
Siddiqi, Imran
Gattal, Abdeljalil
Chibani, Youcef
Souici-Meslati, Labiba
El Abed, Haikal
2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 93 - 97
[5] Multi-script handwritten digit recognition using multi-task learning
Gondere, Mesay Samuel
Schmidt-Thieme, Lars
Sharma, Durga Prasad
Scholz, Randolf
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (01) : 355 - 364
[6] Full Page Handwriting Recognition via Image to Sequence Extraction
Singh, Sumeet S.
Karayev, Sergey
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 55 - 69
[7] Recognition of Numeric Postal Codes from Multi-script Postal Address Blocks
Basu, Subhadip
Das, Nibaran
Sarkar, Ram
Kundu, Mahantapas
Nasipuri, Mita
Basu, Dipak Kumar
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 381 - 386
[8] Text-independent writer recognition using multi-script handwritten texts
Djeddi, Chawki
Siddiqi, Imran
Souici-Meslati, Labiba
Ennaji, Abdellatif
PATTERN RECOGNITION LETTERS, 2013, 34 (10) : 1196 - 1202
[9] Gender classification from offline multi-script handwriting images using oriented Basic Image Features (oBIFs)
Gattal, Abdeljalil
Djeddi, Chawki
Siddiqi, Imran
Chibani, Youcef
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 99 : 155 - 167
[10] Start, Follow, Read: End-to-End Full-Page Handwriting Recognition
Wigington, Curtis
Tensmeyer, Chris
Davis, Brian
Barrett, William
Price, Brian
Cohen, Scott
COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 372 - 388

← 1 2 →