ADOCRNet: A Deep Learning OCR for Arabic Documents Recognition

被引:3
|
作者
Mosbah, Lamia [1 ]
Moalla, Ikram [1 ,2 ]
Hamdani, Tarek M. [1 ,3 ]
Neji, Bilel [4 ]
Beyrouthy, Taha [4 ]
Alimi, Adel M. [1 ,5 ]
机构
[1] Univ Sfax, Natl Engn Sch Sfax ENIS, ReGIM Lab, REs Grp Intelligent Machines, Sfax 3038, Tunisia
[2] Al Baha Univ, Coll Comp Sci & Informat Technol, Al Bahah 65511, Saudi Arabia
[3] Univ Monastir, Higher Inst Comp Sci Mahdia ISIMa, Monastir 5000, Tunisia
[4] Amer Univ Middle East, Coll Engn & Technol, Egaila 54200, Kuwait
[5] Univ Johannesburg, Fac Engn & Built Environm, Dept Elect & Elect Engn Sci, Johannesburg 3038, South Africa
关键词
Arabic; document recognition; CNNs; CTC; deep learning; BLSTM; OCR; NEURAL-NETWORKS; CHARACTER-RECOGNITION;
D O I
10.1109/ACCESS.2024.3379530
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, Optical character recognition (OCR) has experienced a resurgence of interest especially for contemporary Arabic data. In fact, OCR development for printed and handwritten Arabic script is still a challenging task. These challenges are due to the specific characteristics of the Arabic script. In this work, we attempt to address these challenges by creating a deep learning OCR for Arabic document recognition called ADOCRNet. It is a novel deep learning framework whose architecture is built of layers of Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (BLSTM) trained using Connectionist Temporal Classification (CTC) algorithm. In order to assess the performance of our OCR, the proposed system is performed on two printed text datasets which are P-KHATT (text line images) and APTI (word images). It's also evaluated on a handwritten Arabic text dataset IFN/ENIT (word images). According to the practical tests, the conceived model achieves strength recognition rates on the three datasets. ADOCRNet reaches a Character Error Rate (CER) of 0.01% on the P-KHATT dataset, 0.03% on the APTI dataset and a Word Error Rate (WER) of 1.09% on the IFN/ENIT dataset, which significantly outperforms the outcomes of the current systems.
引用
收藏
页码:55620 / 55631
页数:12
相关论文
共 50 条
  • [31] Recognition of Arabic Air-Written Letters: Machine Learning, Convolutional Neural Networks, and Optical Character Recognition (OCR) Techniques
    Nahar, Khalid M. O.
    Alsmadi, Izzat
    Al Mamlook, Rabia Emhamed
    Nasayreh, Ahmad
    Gharaibeh, Hasan
    Almuflih, Ali Saeed
    Alasim, Fahad
    SENSORS, 2023, 23 (23)
  • [32] A deep learning model for Ottoman OCR
    Dolek, Ishak
    Kurt, Atakan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (20):
  • [33] OCR OF ARABIC TEXTS
    AMIN, A
    LECTURE NOTES IN COMPUTER SCIENCE, 1988, 301 : 616 - 625
  • [34] The use of Hartley transform in OCR with application to printed Arabic character recognition
    Mahmoud, Sabri A.
    Mahmoud, Ashraf S.
    PATTERN ANALYSIS AND APPLICATIONS, 2009, 12 (04) : 353 - 365
  • [35] The use of Hartley transform in OCR with application to printed Arabic character recognition
    Sabri A. Mahmoud
    Ashraf S. Mahmoud
    Pattern Analysis and Applications, 2009, 12 : 353 - 365
  • [36] Transfer Learning for Arabic Named Entity Recognition With Deep Neural Networks
    Al-Smadi, Mohammad
    Al-Zboon, Saad
    Jararweh, Yaser
    Juola, Patrick
    IEEE ACCESS, 2020, 8 : 37736 - 37745
  • [37] The Effectiveness of Transfer Learning for Arabic Handwriting Recognition using Deep CNN
    Elleuch, Mohamed
    Jraba, Safa
    Kherallah, Monji
    JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2021, 16 (02): : 85 - 93
  • [38] Towards Unsupervised Learning for Arabic Handwritten Recognition Using Deep Architectures
    Elleuch, Mohamed
    Tagougui, Najiba
    Kherallah, Monji
    NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 363 - 372
  • [39] A Deep Learning based Approach for Recognition of Arabic Sign Language Letters
    Hdioud, Boutaina
    Tirari, Mohammed El Haj
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 424 - 429
  • [40] A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT
    Ahmad, Riaz
    Naz, Saeeda
    Afzal, Muhammad
    Rashid, Sheikh
    Liwicki, Marcus
    Dengel, Andreas
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (03) : 299 - 305