Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage

被引:9
|
作者
Cheikhrouhou, Ahmed [1 ,2 ]
Kessentini, Yousri [1 ,2 ,3 ]
Kanoun, Slim [1 ]
机构
[1] Univ Sfax, MIRACL Lab, Sfax, Tunisia
[2] Ctr Rech Numer Sfax, Sfax, Tunisia
[3] Univ Rouen, LITIS Lab, EA 4108, St Etienne Du Rouvray, France
来源
NEURAL COMPUTING & APPLICATIONS | 2020年 / 32卷 / 13期
关键词
Script identification; Word spotting; BLSTM; HMM; Handwriting; Machine printed; Multi-script; Arabic; Latin; WORD; RECOGNITION; TEXT; DISCRIMINATION; FEATURES;
D O I
10.1007/s00521-019-04429-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel script-independent approach for word spotting in printed and handwritten multi-script documents. Since each writing type and script need to be processed using a specific spotting engine, the proposed system proceeds on two stages. The script identification is a preliminary stage that aims at recognizing on one level the writing type and the script of the input image document. Second, a specific word spotting method is used to spot query words in a large collection of documents. The proposed spotting system is based on deep bidirectional long short-term memory neural network and hidden Markov model (HMM) hybrid architecture. It takes advantage of DNN's strong representation learning power and HMM's sequential modeling ability. The global system has been evaluated on a mixed corpus of public databases such as KHATT, PKHATT for Arabic script and RIMES for Latin script. The experimental results on script identification and keyword spotting confirm the effectiveness of the proposed approach.
引用
收藏
页码:9201 / 9215
页数:15
相关论文
共 29 条
  • [1] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
    Ahmed Cheikhrouhou
    Yousri Kessentini
    Slim Kanoun
    [J]. Neural Computing and Applications, 2020, 32 : 9201 - 9215
  • [2] HMM Based Keyword Spotting System in Printed/Handwritten Arabic/Latin Documents with Identification Stage
    Rouhou, Ahmed Cheikh
    Kessentini, Yousri
    Kanoun, Slim
    [J]. IMAGE ANALYSIS AND RECOGNITION, ICIAR 2019, PT I, 2019, 11662 : 309 - 320
  • [3] Statistical comparison of classifiers for script identification from multi-script handwritten documents
    Singh, Pawan Kumar
    Sarkar, Ram
    Das, Nibaran
    Basu, Subhadip
    Nasipuri, Mita
    [J]. INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2014, 1 (02) : 152 - 172
  • [4] Page-level Script Identification from Multi-script Handwritten Documents
    Singh, Pawan Kumar
    Dalal, Santu Kumar
    Sarkar, Ram
    Nasipuri, Mita
    [J]. 2015 THIRD INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT), 2015,
  • [5] Word-Level Script Identification from Handwritten Multi-script Documents
    Singh, Pawan Kumar
    Mondal, Arafat
    Bhowmik, Showmik
    Sarkar, Ram
    Nasipuri, Mita
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 551 - 558
  • [6] Script Identification of Multi-Script Documents: A Survey
    Ubul, Kurban
    Tursun, Gulzira
    Aysa, Alimjan
    Impedovo, Donato
    Pirlo, Giuseppe
    Yibulayin, Tuergen
    [J]. IEEE ACCESS, 2017, 5 : 6546 - 6559
  • [7] Separating Indic Scripts with matra for Effective Handwritten Script Identification in Multi-Script Documents
    Obaidullah, Sk Md
    Goswami, Chitrita
    Santosh, K. C.
    Das, Nibaran
    Halder, Chayan
    Roy, Kaushik
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (05)
  • [8] HVS inspired system for script identification in Indian multi-script documents
    Pati, PB
    Ramakrishnan, AG
    [J]. DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 380 - 389
  • [9] A Texture based approach to Word-level Script Identification from Multi-script Handwritten Documents
    Singh, Pawan Kumar
    Khan, Aparajita
    Sarkar, Ram
    Nasipuri, Mita
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 228 - 232
  • [10] Two-Stage Approach to Keyword Spotting in Handwritten Documents
    Haji, Mehdi
    Ameri, Mohammad R.
    Bui, Tien D.
    Suen, Ching Y.
    Ponson, Dominique
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XXI, 2014, 9021