A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application

被引:7
|
作者
Shayegan, Mohammad Amin [1 ]
Aghabozorgi, Saeed [2 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, R&D Ctr, Dept Artificial Intelligence,Image Proc & Pattern, Kuala Lumpur 50603, Malaysia
[2] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
关键词
PRINCIPAL COMPONENT ANALYSIS; RECOGNITION; SELECTION; DIGITS;
D O I
10.1155/2014/537428
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A major problem of pattern recognition systems is due to the large volume of training datasets including duplicate and similar training samples. In order to overcome this problem, some dataset size reduction and also dimensionality reduction techniques have been introduced. The algorithms presently used for dataset size reduction usually remove samples near to the centers of classes or support vector samples between different classes. However, the samples near to a class center include valuable information about the class characteristics and the support vector is important for evaluating system efficiency. This paper reports on the use of Modified Frequency Diagram technique for dataset size reduction. In this new proposed technique, a training dataset is rearranged and then sieved. The sieved training dataset along with automatic feature extraction/selection operation using Principal Component Analysis is used in an OCR application. The experimental results obtained when using the proposed system on one of the biggest handwritten Farsi/Arabic numeral standard OCR datasets, Hoda, show about 97% accuracy in the recognition rate. The recognition speed increased by 2.28 times, while the accuracy decreased only by 0.7%, when a sieved version of the dataset, which is only as half as the size of the initial training dataset, was used.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A New Classification Method for PCA-based Face Recognition
    Zhou, Xiaofei
    Shi, Yong
    Zhang, Peng
    Nie, Guangli
    Jiang, Wenhan
    [J]. 2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 445 - 449
  • [2] PCA-BASED HUMAN POSTURE CLASSIFICATION
    Tahir, Nooritawati Md
    Hussain, Aini
    Samad, Salina Abdul
    Husain, Hafizah
    [J]. JURNAL TEKNOLOGI, 2007, 46
  • [3] PCA-based Feature Reduction for Hyperspectral Remote Sensing Image Classification
    Uddin, Md. Palash
    Al Mamun, Md.
    Hossain, Md. Ali
    [J]. IETE TECHNICAL REVIEW, 2021, 38 (04) : 377 - 396
  • [4] PCA-Based Animal Classification System
    Dandil, Emre
    Polattimur, Rukiye
    [J]. 2018 2ND INTERNATIONAL SYMPOSIUM ON MULTIDISCIPLINARY STUDIES AND INNOVATIVE TECHNOLOGIES (ISMSIT), 2018, : 497 - 501
  • [5] PCA-based dimension reduction for splines
    Van der Linde, A
    [J]. JOURNAL OF NONPARAMETRIC STATISTICS, 2003, 15 (01) : 77 - 92
  • [6] PCA-based Noise Reduction in Ambulatory ECGs
    Romero, I.
    [J]. COMPUTING IN CARDIOLOGY 2010, VOL 37, 2010, 37 : 677 - 680
  • [7] PCA-based gene selection for cancer classification
    Kavitha, K. R.
    Ram, Aiswarya V.
    Anandu, S.
    Karthik, S.
    Kailas, Sreeja
    Arjun, N. M.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC 2018), 2018, : 1 - 4
  • [8] Exploring Dataset Similarities using PCA-based Feature Selection
    Siegert, Ingo
    Boeck, Ronald
    Wendemuth, Andreas
    Vlasenko, Bogdan
    [J]. 2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 387 - 393
  • [9] Classification of Alzheimer's Disease Stages: An Approach Using PCA-Based Algorithm
    Ahmad, Fayyaz
    Dar, Waqar Mahmood
    [J]. AMERICAN JOURNAL OF ALZHEIMERS DISEASE AND OTHER DEMENTIAS, 2018, 33 (07): : 433 - 439
  • [10] A PCA-based approach for brain aneurysm segmentation
    Dakua, Sarada Prasad
    Abinahed, Julien
    Al-Ansari, Abdulla
    [J]. MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2018, 29 (01) : 257 - 277