Holistic word descriptor for lexicon reduction in handwritten arabic documents

被引:3
|
作者
Elaiwat, Said [1 ]
机构
[1] Jouf Univ, Coll Comp & Informat Sci, Dept Comp Sci, Sakakah 72441, Saudi Arabia
关键词
Word descriptor; Local shape descriptor; Lexicon reduction; Multi-scale representation; Contour matching; Arabic handwritten documents; RECOGNITION;
D O I
10.1016/j.patcog.2021.108072
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of word recognition systems rely on a pre-defined lexicon in aims to achieve high performance. Recently, the availability of training /testing data allows to include a huge number of words in the lexicon to recognize. However, this leads to high computation cost as the lexicon is grown. In addition, including more and more word-classes may lead to increase the burden on classification methods and degrade the recognition rate. In this work, we propose a holistic word descriptor for word lexicon reduction in Arabic handwritten documents. The proposed descriptor represents geometrical features of word shape through three main feature sets, defined from multi-scale convexity concavity analysis. The first two sets are dedicated to defined the number of peaks and their intensity levels of convexity/concavity peaks, respectively. In contrast, the last set is dedicated to define a region codes of the peaks by analyzing their regions according to their spatial information. Given a query word and lexicon(reference dataset), the lexicon reduction system is applied by first defining the holistic word descriptor for both query word and each word in the lexicon. The lexicon is then indexed according to its distances to the query word descriptor. Finally, the reduced lexicon is formulated from the first kth entries of the indexed lexicon. The proposed system has been evaluated under two well-known Arabic datasets, namely Ibn Sina and IFN/ENIT. Reported results show superior performance compared to prior art, with 93 . 7% and 91 . 2% reduction efficacy for Ibn Sina and IFN/ENIT, respectively. (c) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Sparse Descriptor for Lexicon Reduction in Handwritten Arabic Documents
    Chherawala, Youssouf
    Wisnovsky, Robert
    Cheriet, Mohamed
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3729 - 3732
  • [2] Arabic word descriptor for handwritten word indexing and lexicon reduction
    Chherawala, Youssouf
    Cheriet, Mohamed
    [J]. PATTERN RECOGNITION, 2014, 47 (10) : 3477 - 3486
  • [3] Holistic lexicon reduction for handwritten word recognition
    Madhvanath, S
    Govindaraju, V
    [J]. DOCUMENT RECOGNITION III, 1996, 2660 : 224 - 234
  • [4] TWO-STAGE LEXICON REDUCTION FOR OFFLINE ARABIC HANDWRITTEN WORD RECOGNITION
    Mozaffari, Saeed
    Faez, Karim
    Maergner, Volker
    El Abed, Haikal
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2008, 22 (07) : 1323 - 1341
  • [5] Holistic word recognition for handwritten historical documents
    Lavrenko, V
    Rath, TM
    Manmatha, R
    [J]. FIRST INTERNATIONAL WORKSHOP ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES, PROCEEDINGS, 2004, : 278 - 287
  • [6] W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents
    Chherawala, Youssouf
    Cheriet, Mohamed
    [J]. PATTERN RECOGNITION, 2012, 45 (09) : 3277 - 3287
  • [7] Holistic approach for classifying and retrieving personal Arabic handwritten documents
    Brook, Salama
    Al Aghbar, Zaher
    [J]. ADVANCES ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, PROCEEDINGS, 2008, : 565 - +
  • [8] A Novel Approach for the Recognition of a Wide Arabic Handwritten Word Lexicon
    Ben Cheikh, I.
    Belaid, A.
    Kacem, A.
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3055 - 3058
  • [9] Lexicon reduction using dots for off-line Farsi/Arabic handwritten word recognition
    Mozaffari, Saeed
    Faez, Karim
    Maergner, Volker
    El-Abed, Halkal
    [J]. PATTERN RECOGNITION LETTERS, 2008, 29 (06) : 724 - 734
  • [10] Strategies for Large Handwritten Farsi/Arabic Lexicon Reduction
    Mozaffari, Saeed
    Faez, Karim
    Maergner, Volker
    El-Abed, Haikal
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 98 - +