Recognition of printed arabic text based on global features and decision tree learning techniques

被引:29
|
作者
Amin, A [1 ]
机构
[1] Univ New S Wales, Sch Engn & Comp Sci, Sydney, NSW 2052, Australia
关键词
pattern recognition; printed Arabic text; connected component; skew detection and correction; global features; structural classification; machine learning C4.5; cross-validation;
D O I
10.1016/S0031-3203(99)00114-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine simulation of human reading has been the subject of intensive research for almost three decades. A large number of research papers and reports have already been published on Latin, Chinese and Japanese characters. However, little work has been conducted on the automatic recognition of Arabic in both on-line and off-line, has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text databases, dictionaries, etc., and of course because of the cursive nature of its writing rules, and this problem is still an open research field. This paper presents a new technique for the recognition of Arabic text using the C4.5 machine learning system. The advantage of machine learning are twofold: it can generalize over the large degree of variations between different fonts and writing style and recognition rules can be constructed by examples. The technique can be divided into three major steps. The first step is digitization and pre-processing to create connected component, detect the skew of a document image and correct it. Second, feature extraction. where global features of the input Arabic word is used to extract features such as number of subwords, number of peaks within the subword, number and position of the complementary character etc., to avoid the difficulty of segmentation stage. Finally, machine learning C4.5 is used to generate a decision tree for classifying each word. The system was tested with 1000 Arabic words with different fonts (each word has 15 samples) and the correct average recognition rate obtained using cross-validation was 92%. (C) 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:1309 / 1323
页数:15
相关论文
共 50 条
  • [21] Transfer Learning to improve Arabic handwriting text Recognition
    Noubigh, Zouhaira
    Mezghani, Anis
    Kherallah, Monji
    2020 21ST INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2020,
  • [22] A Hybrid Deep Learning Model for Arabic Text Recognition
    Fasha, Mohammad
    Hammo, Bassam
    Obeid, Nadim
    AlWidian, Jabir
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (08) : 122 - 130
  • [23] A Review of Feature Extraction Techniques for Handwritten Arabic Text Recognition
    El qacimy, Bouchra
    Hammouch, Ahmed
    Ait Kerroum, Mounir
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES (ICEIT 2015), 2015, : 241 - 245
  • [24] Feature Extraction Techniques of Online Handwriting Arabic Text Recognition
    Abuzaraida, Mustafa Ali
    Zeki, Akram M.
    Zeki, Ahmed M.
    2013 5TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR THE MUSLIM WORLD (ICT4M), 2013,
  • [25] Decision Tree Based Recognition of Bangla Text from Outdoor Scene Images
    Ghoshal, Ranjit
    Roy, Anandarup
    Bhowmik, Tapan Kumar
    Parui, Swapan K.
    NEURAL INFORMATION PROCESSING, PT III, 2011, 7064 : 538 - +
  • [26] Adapting a decision Tree based Tagger for Arabic
    Imad, Zeroual
    Abdelhak, Lakhouaja
    2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY FOR ORGANIZATIONS DEVELOPMENT (IT4OD), 2016,
  • [27] A Markovian Engine for Text Recognition: Cursive Arabic Text, Statistical Features and Interconnected HMMs
    Khorsheed, M. S.
    Al-Omari, H.
    IMAGE ANALYSIS AND RECOGNITION, PT I, 2012, 7324 : 375 - 381
  • [28] Primitive Printed Arabic Optical Character Recognition using Statistical Features
    Dahi, Mohamed
    Semary, Noura A.
    Hadhoud, Mohiy M.
    2015 IEEE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INFORMATION SYSTEMS (ICICIS), 2015, : 567 - 571
  • [29] Developing Discrete Density Hidden Markov Models for Arabic Printed Text Recognition
    Awaida, Sameh M.
    Khorsheed, Mohammad S.
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND CYBERNETICS (CYBERNETICSCOM), 2012, : 35 - 39
  • [30] Printed Arabic Character Recognition using Local Energy and Structural Features
    Zaafouri, Ahmed
    Sayadi, Mounir
    Fnaiech, Farhat
    2012 2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS, COMPUTING AND CONTROL APPLICATIONS (CCCA), 2012,