Arabic calligraphy, typewritten and handwritten using optical character recognition (OCR) system

被引:2
|
作者
Al-Barhamtoshy, Hassanin M. [1 ]
Jambi, Kamal M. [1 ]
Ahmed, Hany [2 ]
Mohamed, Shaimaa [3 ]
Abdo, Sherif M. [2 ]
Rashwan, Mohsen A. [3 ]
机构
[1] King Abdulaziz Univ, Dept Informat Technol, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
[2] Cairo Univ, Fac Comp & Informat Syst, Cairo, Egypt
[3] Cairo Univ, Elect & Commun Dept, Cairo, Egypt
来源
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS | 2019年 / 12卷 / 02期
关键词
ARABIC OCR; SEGMENTATION; FEATURE EXTRACTION; CALLIGRAPHY; TYPEWRITTEN; HANDWRITTEN; HMM;
D O I
10.21786/bbrc/12.2/11
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
This paper describes an Omni OCR system for recognizing typewritten and handwritten Arabic texts documents. The proposed system of the Arabic OCR system can be classified into four main phases. The first phase is the pre-processing phase; it focuses on binarizing, skewing treatment, framing, and noise removing from the prepared documents (dataset). The second phase aims to segment the preprocessed documents into lines and words. Two main tasks are pointed during this phase: language model with the used Arabic dictionary, and the detection of segmented lines and segmented words. The third phase is features extraction phase; it is used to extract features for each segmented line/word according to the used language model. Finally, the classifier or the recognizer will be used to recognize each word/line into a text stream. Therefore, scientific evaluation of the four phases will be applied to measure the accuracy of the Arabic OCR system. The recognition approachis based on Hidden Markov Models (HMM) with the prepared datasets and software development tool are discussed and introduced. State of the art OCR's recognition systems are now capable to perform accuracy of 70% for unconstrained Arabic texts. However, this outline is still far away from what is required in a lot of useful applications. In other words, this paper describes a proposed approach based on language model with ligature and overlap characters for the pro-posed Arabic OCR. Therefore, a posterior word-based approach is used with tri-gram model to recognize the Arabic text. Features are extracted from images of words and generated pattern using the proposed solution. We test our proposed OCR system in different categories of Arabic documents: early printed or typewritten, printed, historical and calligraphy documents. The test bed of our system gives 12.5%-character error rate compared to the best OCR of other systems.
引用
收藏
页码:283 / 296
页数:14
相关论文
共 50 条
  • [1] Optical character recognition of handwritten Arabic using hidden Markov models
    Aulama, Mohannad M.
    Natsheh, Asem M.
    Abandah, Gheith A.
    Olama, Mohammed M.
    OPTICAL PATTERN RECOGNITION XXII, 2011, 8055
  • [2] Optical Character Recognition of Arabic Handwritten Characters using Neural Network
    Hussien, Rana S.
    Elkhidir, Azza A.
    Elnourani, Mohamed G.
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, CONTROL, NETWORKING, ELECTRONICS AND EMBEDDED SYSTEMS ENGINEERING (ICCNEEE), 2015, : 456 - 461
  • [3] A Survey on Arabic Optical Character Recognition and an Isolated Handwritten Arabic Character Recognition Algorithm using Encoded Freeman Chain Code
    Althobaiti, Hassan
    Lu, Chao
    2017 51ST ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2017,
  • [4] A Survey on Arabic Handwritten Character Recognition
    Ali A.A.A.
    Suresha M.
    Ahmed H.A.M.
    SN Computer Science, 2020, 1 (3)
  • [5] A Database for Arabic Handwritten Character Recognition
    AlKhateeb, Jawad H.
    INTERNATIONAL CONFERENCE ON COMMUNICATIONS, MANAGEMENT, AND INFORMATION TECHNOLOGY (ICCMIT'2015), 2015, 65 : 556 - 561
  • [6] Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)
    Memon, Jamshed
    Sami, Maira
    Khan, Rizwan Ahmed
    Uddin, Mueen
    IEEE ACCESS, 2020, 8 : 142642 - 142668
  • [7] OPTICAL CHARACTER RECOGNITION (OCR)
    FRANK, AI
    COMPUTERS AND AUTOMATION, 1970, 19 (11): : 24 - &
  • [8] Review on OCR for Handwritten Indian Scripts Character Recognition
    Kumar, Munish
    Jindal, M. K.
    Sharma, R. K.
    ADVANCES IN DIGITAL IMAGE PROCESSING AND INFORMATION TECHNOLOGY, 2011, 205 : 268 - +
  • [9] ARABIC CHARACTER-RECOGNITION SYSTEM - A STATISTICAL APPROACH FOR RECOGNIZING CURSIVE TYPEWRITTEN TEXT
    ELDABI, SS
    RAMSIS, R
    KAMEL, A
    PATTERN RECOGNITION, 1990, 23 (05) : 485 - 495
  • [10] Arabic Handwritten Character Recognition Using Machine Learning Approaches
    Ali, Amani Ali Ahmed
    Suresha, M.
    2019 FIFTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP 2019), 2019, : 187 - 192