Arabic Word Recognition System for Historical Documents using Multiscale Representation Method

被引:0
|
作者
Elaiwat, Said [1 ]
Abu-Zanona, Marwan [2 ]
机构
[1] Jouf Univ, Coll Comp & Informat Sci, Dept Comp Sci, Sakakah 72441, Saudi Arabia
[2] Al Imam Mohammad IbnSaud Islamic Univ IMSIU, Coll Sharia & Islamic Studies Al Ahsaa, Dept Comp Sci, Al Ahsaa, Saudi Arabia
关键词
Word recognition; multiscale convexity concavity analysis; historical documents; dynamic time warping; HANDWRITING RECOGNITION; FEATURES;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the last decades, huge efforts have been made to develop automated handwriting recognition systems. The task of recognition usually involves several complex processes including image pre-processing, segmentation, features extracting and matching. This task usually gets harder by processing historical documents as they involve skews, document degradation and structure noise. Although, the success that has been achieved in English language, the recognition of handwritten Arabic still constitutes a major challenge for many reasons. The characteristic of Arabic language, as a Semitic language, differs from other languages (e.g., European languages) in several aspects such as complex structure, implicit characters, concatenation and, writing styles and direction. This work proposes a full recognition system for the task of word recognition from from Arabic historical documents. In the proposed system, a novel feature extraction method is presented to define robust features from Arabic words. Prior Feature extraction, each input image is pre-processed and segmented resulting in segmented words. After that, the features of each word/sub-word are defined based on Multiscale Convexity Concavity(MCC) analysis of contour word shape. For feature matching, a circular shift method is proposed to burn the computational cost instead of using traditional dynamic time warping (DTW) which exhibits high computational cost. Finally, the proposed algorithm has been evaluated under well-known dataset, namely, Ibn Sina, and showed high performance for historical documents with low computational cost.
引用
收藏
页码:823 / 830
页数:8
相关论文
共 50 条
  • [41] Word matching using single closed contours for indexing handwritten historical documents
    Adamek, Tornasz
    O'Connor, Noel E.
    Smeaton, Alan F.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 9 (2-4) : 153 - 165
  • [42] Symbols recognition system for graphic documents combining global structural approaches and using a XML representation of data
    Delalandre, M
    Trupin, É
    Ogier, JM
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 425 - 433
  • [43] Arabic Named Entity Recognition Using Boosting Method
    Sajadi, Mohamad Bagher
    Minaei, Behrooz
    2017 19TH CSI INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2017, : 281 - 288
  • [44] Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM
    Dehghan, M
    Faez, K
    Ahmadi, M
    Shridhar, M
    PATTERN RECOGNITION, 2001, 34 (05) : 1057 - 1065
  • [45] Improving Handwriting Recognition for Historical Documents Using Synthetic Text Lines
    Spoto, Martin
    Wolf, Beat
    Fischer, Andreas
    Scius-Bertrand, Anna
    INTERTWINING GRAPHONOMICS WITH HUMAN MOVEMENTS, IGS 2021, 2022, 13424 : 61 - 75
  • [46] Lanna Handwritten Character Recognition on Historical Documents Using Feature Extraction
    Khankasikam, Krisda
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 2553 - 2560
  • [47] An Arabic character recognition system using neural network
    Sanossian, HYY
    NEURAL NETWORKS FOR SIGNAL PROCESSING VI, 1996, : 340 - 348
  • [48] Arabic Mispronunciation Recognition System Using LSTM Network
    Ahmed, Abdelfatah
    Bader, Mohamed
    Shahin, Ismail
    Nassif, Ali Bou
    Werghi, Naoufel
    Basel, Mohammad
    INFORMATION, 2023, 14 (07)
  • [49] Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents
    Rai, Anuj
    Krishnan, Narayanan C.
    Chanda, Sukalpa
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 19 - 33
  • [50] Word sense disambiguation using Skip gram model to create a Historical Dictionary for Arabic
    Laatar, Rim
    Aloulou, Chafik
    Belguith, Lamia Hadrich
    2018 IEEE/ACS 15TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2018,