Multi-task learning for simultaneous script identification and keyword spotting in document images

被引：9

作者：

Cheikhrouhou, Ahmed ^{[1
,2
]}

Kessentini, Yousri ^{[1
,3
]}

Kanoun, Slim ^{[2
]}

机构：

[1] Digital Res Ctr Sfax, Sfax, Tunisia

[2] Univ Sfax, MIRACL Lab, Sfax, Tunisia

[3] SM RTS Lab Signals Syst aRtificial Intelligence &, Sfax, Tunisia

来源：

PATTERN RECOGNITION | 2021年 / 113卷

关键词：

CBP; CTC; Keyword spotting; Script identification; Handwritten; RECOGNITION;

D O I：

10.1016/j.patcog.2021.107832

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, an end-to-end multi-task deep neural network was proposed for simultaneous script identification and Keyword Spotting (KWS) in multi-lingual hand-written and printed document images. We introduced a unified approach which addresses both challenges cohesively, by designing a novel CNNBLSTM architecture. The script identification stage involves local and global features extraction to allow the network to cover more relevant information. Contrarily to the traditional feature fusion approaches which build a linear feature concatenation, we employed a compact bi-linear pooling to capture pairwise correlations between these features. The script identification result is, then, injected in the KWS module to eliminate characters of irrelevant scripts and perform the decoding stage using a single-script mode. All the network parameters were trained in an end-to-end fashion using a multi-task learning that jointly minimizes the NLL loss for the script identification and the CTC loss for the KWS. Our approach was evaluated on a variety of public datasets of different languages and writing types.. Experiments proved the efficacy of our deep multi-task representation learning compared to the state-of-the-art systems for both of keyword spotting and script identification tasks. (c) 2021 Elsevier Ltd. All rights reserved.

引用

页数：10

共 50 条

[1] Personalized Keyword Spotting through Multi-task Learning
Yang, Seunghan
Kim, Byeonggeun
Chung, Inseop
Chang, Simyung
[J]. INTERSPEECH 2022, 2022, : 1881 - 1885
[2] MULTI-TASK LEARNING WITH CROSS ATTENTION FOR KEYWORD SPOTTING
Higuchil, Takuya
Gupta, Anmol
Dhir, Chandra
[J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 571 - 578
[3] Simultaneous Script Identification and Handwriting Recognition via Multi-Task Learning of Recurrent Neural Networks
Chen, Zhuo
Wu, Yichao
Yin, Pei
Liu, Cheng-Lin
[J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 525 - 530
[4] Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
Panchapagesan, Sankaran
Sun, Ming
Khare, Aparna
Mandal, Spyros Matsoukas Arindam
Hoffineister, Bjorn
Vitaladevuni, Shiv
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 760 - 764
[5] Keyword spotting on Korean document images by matching the keyword image
Kim, SH
Park, SC
Jeong, CB
Kim, JS
Park, HR
Lee, GS
[J]. DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS, 2005, 3815 : 158 - 166
[6] A keyword spotting system of Korean document images
Oh, IS
Choi, YS
Yang, JH
Kim, SH
[J]. DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS, 2002, 2555 : 530 - 530
[7] Learning Task Relatedness in Multi-Task Learning for Images in Context
Strezoski, Gjorgji
van Noord, Nanne
Worring, Marcel
[J]. ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 78 - 86
[8] Stratified Multi-Task Learning for Robust Spotting of Scene Texts
Dasgupta, Kinjal
Das, Sudip
Bhattacharya, Ujjwal
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3130 - 3137
[9] A survey of keyword spotting techniques for printed document images
Murugappan, Abirami
Ramachandran, Baskaran
Dhavachelvan, P.
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2011, 35 (02) : 119 - 136
[10] A survey of keyword spotting techniques for printed document images
Abirami Murugappan
Baskaran Ramachandran
P. Dhavachelvan
[J]. Artificial Intelligence Review, 2011, 35 : 119 - 136

← 1 2 3 4 5 →