Recognition of printed arabic text based on global features and decision tree learning techniques

被引:29
|
作者
Amin, A [1 ]
机构
[1] Univ New S Wales, Sch Engn & Comp Sci, Sydney, NSW 2052, Australia
关键词
pattern recognition; printed Arabic text; connected component; skew detection and correction; global features; structural classification; machine learning C4.5; cross-validation;
D O I
10.1016/S0031-3203(99)00114-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine simulation of human reading has been the subject of intensive research for almost three decades. A large number of research papers and reports have already been published on Latin, Chinese and Japanese characters. However, little work has been conducted on the automatic recognition of Arabic in both on-line and off-line, has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text databases, dictionaries, etc., and of course because of the cursive nature of its writing rules, and this problem is still an open research field. This paper presents a new technique for the recognition of Arabic text using the C4.5 machine learning system. The advantage of machine learning are twofold: it can generalize over the large degree of variations between different fonts and writing style and recognition rules can be constructed by examples. The technique can be divided into three major steps. The first step is digitization and pre-processing to create connected component, detect the skew of a document image and correct it. Second, feature extraction. where global features of the input Arabic word is used to extract features such as number of subwords, number of peaks within the subword, number and position of the complementary character etc., to avoid the difficulty of segmentation stage. Finally, machine learning C4.5 is used to generate a decision tree for classifying each word. The system was tested with 1000 Arabic words with different fonts (each word has 15 samples) and the correct average recognition rate obtained using cross-validation was 92%. (C) 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1309 / 1323
页数:15
相关论文
共 50 条
  • [31] MACHINE RECOGNITION OF PRINTED ARABIC TEXT UTILIZING NATURAL-LANGUAGE MORPHOLOGY
    AMIN, A
    ALFEDAGHI, S
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1991, 35 (06): : 769 - 788
  • [32] Curriculum Learning for Printed Text Line Recognition of Ligature-based Scripts
    Ul-Hasan, Adnan
    Shafait, Faisal
    Liwicki, Marcus
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1001 - 1005
  • [33] Deep Learning and Recurrent Connectionist-based Approaches for Arabic Text Recognition in Videos
    Yousfi, Sonia
    Berrani, Sid-Ahmed
    Garcia, Christophe
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1026 - 1030
  • [34] DBN - Based learning for Arabic Handwritten Digit Recognition Using DCT Features
    AlKhateeb, Jawad H.
    Alseid, Marwan
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2014, : 222 - 226
  • [35] Adaptive dissection based subword segmentation of printed Arabic text
    Zidouri, A
    Sarfraz, M
    Shahab, SA
    Jafri, SM
    NINTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION, PROCEEDINGS, 2005, : 239 - 243
  • [36] Water Meter Reading Based on Text Recognition Techniques and Deep Learning
    van, Bay Nguyen
    Nguyen, Anh
    Tran-Trung, Kiet
    Huong, Thien Ho
    Hong, Ha Duong Thi
    Trung, Hau Nguyen
    Hoang, Vinh Truong
    IEEE ACCESS, 2025, 13 : 41422 - 41434
  • [37] Decision tree and deep learning based probabilistic model for character recognition
    Sampath, A. K.
    Gomathi, N.
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2017, 24 (12) : 2862 - 2876
  • [38] PC based offline Arabic text recognition system
    Zidouri, A
    Sarfraz, M
    Nawaz, SN
    Ahmad, MJ
    SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS, 2003, : 431 - 434
  • [39] Decision tree and deep learning based probabilistic model for character recognition
    A.K.Sampath
    Dr.N.Gomathi
    JournalofCentralSouthUniversity, 2017, 24 (12) : 2862 - 2876
  • [40] Decision tree and deep learning based probabilistic model for character recognition
    A. K. Sampath
    Dr. N. Gomathi
    Journal of Central South University, 2017, 24 : 2862 - 2876