Improving Handwritten Arabic Text Recognition Using an Adaptive Data-Augmentation Algorithm

被引:2
|
作者
Eltay, Mohamed [1 ]
Zidouri, Abdelmalek [1 ]
Ahmad, Irfan [2 ]
Elarian, Yousef [3 ]
机构
[1] King Fahd Univ Petr & Minerals, Interdisciplinary Res Ctr Intelligent Secure Syst, Elect Engn Dept, Dhahran, Saudi Arabia
[2] King Fahd Univ Petr & Minerals, Interdisciplinary Res Ctr Intelligent Secure Syst, Informat & Comp Sci Dept, Dhahran, Saudi Arabia
[3] Cambrian Coll, Sudbury, ON, Canada
关键词
Handwriting recognition; Deep Learning Neural Network; Data augmentation; Recurrent Neural Network; Connectionist temporal classification;
D O I
10.1007/978-3-030-86198-8_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has increased the performance of classification and object detection, but it generally requires large amounts of labeled data for training. In this paper, we introduce a new data augmentation algorithm that promotes diversity between classes, representing the characters of the Arabic script, and can balance samples between different classes. This algorithm gives each word in the lexicon a weight. The weight of a word is based on the occurrence probabilities of the characters constituting the word. Minority classes are given higher weight as compared to the classes frequently occurring in the text. The data augmentation technique was evaluated on a handwritten word recognition task using the publicly available IFN/ENIT and AHDB datasets. We see significant improvement in results by employing our data augmentation technique, and we achieve state-of-the-art results on both datasets.
引用
收藏
页码:322 / 335
页数:14
相关论文
共 50 条
  • [41] Recognition of Cursive Arabic Handwritten Text Using Embedded Training Based on Hidden Markov Models
    Rabi, Mouhcine
    Amrouch, Mustapha
    Mahani, Zouhir
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (01)
  • [42] Printed Ottoman text recognition using synthetic data and data augmentation
    Esma F. Bilgin Tasdemir
    International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 273 - 287
  • [43] Printed Ottoman text recognition using synthetic data and data augmentation
    Tasdemir, Esma F. Bilgin F.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2023, 26 (03) : 273 - 287
  • [44] LADA-Trans-NER: Adaptive Efficient Transformer for Chinese Named Entity Recognition using Lexicon-Attention and Data-Augmentation
    Liu, Jiguo
    Liu, Chao
    Li, Nan
    Gao, Shihao
    Liu, Mingqi
    Zhu, Dali
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13236 - 13245
  • [45] A Survey on Arabic Optical Character Recognition and an Isolated Handwritten Arabic Character Recognition Algorithm using Encoded Freeman Chain Code
    Althobaiti, Hassan
    Lu, Chao
    2017 51ST ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2017,
  • [46] An Efficient Segmentation Algorithm for Arabic Handwritten Characters Recognition System
    Fadeel, Mohamed A.
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICS AND COMPUTERS IN SCIENCES AND IN INDUSTRY (MCSI 2016), 2016, : 172 - 177
  • [47] An Efficient Segmentation Algorithm for Arabic Handwritten Characters Recognition System
    Ali, Mohamed A.
    AFRO-EUROPEAN CONFERENCE FOR INDUSTRIAL ADVANCEMENT, AECIA 2014, 2015, 334 : 193 - 204
  • [48] Data Augmentation for Scene Text Recognition
    Atienza, Rowel
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1561 - 1570
  • [49] Ant Colony Clustering Algorithm for Handwritten Arabic Numeral Recognition
    Hu, Kaihua
    Liu, Bingxiang
    Zhang, Yujing
    DIGITAL MANUFACTURING & AUTOMATION III, PTS 1 AND 2, 2012, 190-191 : 261 - +
  • [50] Handwritten Mathematical Expression Recognition: An approach on data augmentation
    Khanh-Ngoc Bui
    Quoc-Kim-Hoang Nguyen
    Thanh-Sach Le
    2021 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND APPLICATIONS (ACOMP 2021), 2021, : 46 - 53