Recognition of printed Arabic text using machine learning

被引:0
|
作者
Amin, A [1 ]
机构
[1] Univ New S Wales, Sch Comp Sci & Engn, Sydney, NSW 2052, Australia
来源
DOCUMENT RECOGNITION V | 1998年 / 3305卷
关键词
pattern recognition; printed Arabic Text; global feature; structural classification; machine learning;
D O I
10.1117/12.304645
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many papers have been concerned with the recognition of Latin, Chinese and Japanese characters. However, although almost a third of a billion people worldwide, in several different languages, use Arabic characters for writing, little research progress, in both on-line and off-line has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text database, dictionaries, etc.. and of course of the cursive nature of its writing rules. The main theme of this paper is the automatic recognition of Arabic printed text using machine learning C4.5. Symbolic machine learning algorithms are designed to accept, example descriptions in the form of feature vactors which include a lable that identifies the class to which an example belongs. The output of the algorithm is a set of rules that classifies unseen examples based on generalisations from the training set. This ability to generalise is the maine attraction of machine learning for handwriting recognition. Samples of a character can be preprocessed into a feature vector representation for presentation to a machine learning algorithm that creates rules for recognising characters of the same class. Symbolic machine learning has several advantages over other learning methods. It is fast in training and in recognition, generalises well, is noise tollerant and the symbolic representation is east to understand. The technique can be divided into three major steps: The first step is pre-processing in which the original image is transformed into a binary image utilizing a 300 dpi scanner and then forming the connected component. Second, global features of the input Arabic word are then extracted such as number subwords, number of peaks within the subword,, number and position of the complementary character, etc.. Finally, Machine learning C4.5 is used for character classification to generate a decision tree.
引用
收藏
页码:62 / 71
页数:10
相关论文
共 50 条
  • [1] Recognition of printed Arabic text via machine learning
    Amin, A
    [J]. INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 1999, : 317 - 326
  • [2] MACHINE RECOGNITION AND CORRECTION OF PRINTED ARABIC TEXT
    AMIN, A
    MARI, JF
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1989, 19 (05): : 1300 - 1306
  • [3] MACHINE RECOGNITION OF OPTICALLY CAPTURED MACHINE PRINTED ARABIC TEXT
    ELKHALY, F
    SIDAHMED, MA
    [J]. PATTERN RECOGNITION, 1990, 23 (11) : 1207 - 1214
  • [4] PRINTED ARABIC TEXT RECOGNITION
    HASSAN, FH
    ALI, WH
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 1991, 16 (04): : 511 - 518
  • [5] Efficient Recognition of Machine Printed Arabic Text Using Partial Segmentation and Hausdorff Distance
    Saabni, Raid
    [J]. 2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 284 - 289
  • [6] Recognition of printed Arabic text using neural networks
    Amin, A
    Mansoor, W
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 612 - 615
  • [7] MACHINE RECOGNITION OF PRINTED ARABIC TEXT UTILIZING NATURAL-LANGUAGE MORPHOLOGY
    AMIN, A
    ALFEDAGHI, S
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1991, 35 (06): : 769 - 788
  • [8] Open-vocabulary recognition of machine-printed Arabic text using hidden Markov models
    Ahmad, Irfan
    Mahmoud, Sabri A.
    Fink, Gernot A.
    [J]. PATTERN RECOGNITION, 2016, 51 : 97 - 111
  • [9] Optical Character Recognition of Arabic Printed Text
    Taha, Safwa
    Babiker, Yusra
    Abbas, Mohamed
    [J]. 2012 IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2012,
  • [10] Optical character recognition of arabic printed text
    Electrical and Electronics Engineering Department, University of Khartoum, Sudan
    [J]. SCOReD - IEEE Stud. Conf. Res. Dev., (235-240):