Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

被引:5
|
作者
Inkeaw, Papangkorn [1 ,2 ]
Udomwong, Piyachat [3 ]
Chaijaruwanich, Jeerayut [4 ]
机构
[1] Chiang Mai Univ, Adv Res Ctr Computat Simulat, Chiang Mai 50200, Thailand
[2] Chiang Mai Univ, Fac Sci, Dept Comp Sci, Chiang Mai 50200, Thailand
[3] Chiang Mai Univ, Int Coll Digital Innovat, Chiang Mai 50200, Thailand
[4] Chiang Mai Univ, Dept Comp Sci, Data Sci Res Ctr, Fac Sci, Chiang Mai 50200, Thailand
关键词
Semi-supervised learning; Active learning; Ground truth generation; Handwritten character recognition;
D O I
10.1016/j.knosys.2021.106953
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semiautomatically generated training dataset is comparable with that classifier trained by actual ground truth. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 25 条
  • [11] A GA-based feature selection approach with an application to handwritten character recognition
    De Stefano, C.
    Fontanella, F.
    Marrocco, C.
    di Freca, A. Scotto
    PATTERN RECOGNITION LETTERS, 2014, 35 : 130 - 141
  • [12] A Semi-automatic Feature Fusion Model for EEG-based Emotion Recognition
    Zhang, Gaotian
    Li, Shiqian
    Wang, Jiabao
    Zhou, Yun
    Xu, Tao
    2021 27TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE (M2VIP), 2021,
  • [13] Seal Recognition and Application Based on Multi-feature Fusion Deep Learning
    Zhang Z.
    Xia S.
    Liu Z.
    Data Analysis and Knowledge Discovery, 2024, 8 (03) : 143 - 155
  • [14] LEARNING TO SEGMENT THE LUNG VOLUME FROM CT SCANS BASED ON SEMI-AUTOMATIC GROUND-TRUTH
    Sousa, Patrick
    Galdran, Adrian
    Costa, Pedro
    Campilho, Aurelio
    2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 1202 - 1206
  • [15] Underground multi-target recognition of ground penetrating radar based on multi-feature information fusion
    Zou, Hailin
    Liu, Chanjuan
    Zhou, Shusen
    Zang, Mujun
    Metallurgical and Mining Industry, 2015, 7 (07): : 274 - 282
  • [16] A Semi-Automatic Video Labeling Tool for Autonomous Driving Based on Multi-Object Detector and Tracker
    Wang, Ben-Li
    King, Chung-Ta
    Chu, Hung-Kuo
    2018 SIXTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR 2018), 2018, : 201 - 206
  • [17] A Novel Framework for Automatic Chinese Question Generation Based on Multi-Feature Neural Network Model
    Zheng, Hai-Tao
    Han, Jinxin
    Chen, Jinyuan
    Sangaiah, Arun Kumar
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2018, 15 (03) : 487 - 499
  • [18] Automatic Recognition and Counting Method of Deep-sea Jellyfish Based on Image Multi-feature Matching
    Zhang, Junshao
    Zhang, Xi
    2019 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2019), VOL 1, 2019, : 233 - 236
  • [19] Application of Multi-Feature Fusion Based on Deep Learning in Pedestrian Re-Recognition Method
    Han, Ke
    Zhang, Ning
    Xie, Haoyang
    Wang, Qianlong
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [20] Multi-order standard deviation based distance metrics and its application in handwritten Chinese character recognition
    Ren junling
    18th International Conference on Pattern Recognition, Vol 2, Proceedings, 2006, : 1114 - 1117