Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

被引：5

作者：

Inkeaw, Papangkorn ^{[1
,2
]}

Udomwong, Piyachat ^{[3
]}

Chaijaruwanich, Jeerayut ^{[4
]}

机构：

[1] Chiang Mai Univ, Adv Res Ctr Computat Simulat, Chiang Mai 50200, Thailand

[2] Chiang Mai Univ, Fac Sci, Dept Comp Sci, Chiang Mai 50200, Thailand

[3] Chiang Mai Univ, Int Coll Digital Innovat, Chiang Mai 50200, Thailand

[4] Chiang Mai Univ, Dept Comp Sci, Data Sci Res Ctr, Fac Sci, Chiang Mai 50200, Thailand

来源：

KNOWLEDGE-BASED SYSTEMS | 2021年 / 220卷

关键词：

Semi-supervised learning; Active learning; Ground truth generation; Handwritten character recognition;

D O I：

10.1016/j.knosys.2021.106953

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A huge number of labeled samples are required as training data to construct an efficient recognition mechanism for an optical character recognition system. Although samples of characters can be easily collected from available manuscripts, they often lack class labels, especially for ancient and local alphabets. The creation of a training dataset requires a great number of characters manually annotated by experts. It is a costly and time-consuming process. To considerably reduce the human effort required in the construction of training datasets, a novel semi-automatic labeling method is proposed in this work under the assumption that there are no initial labeled samples. The proposed method performs an iterative procedure on a nearest neighbor graph that views samples in multiple feature spaces. In each iteration, an expert is first called upon to label a relevant unlabeled sample that is automatically selected from the highest density area of unlabeled samples. Then, the manually annotated label is propagated to the neighbor samples with safe conditions based on sample density and multi-views. The procedure is repeated until all unlabeled samples are labeled. The labeling procedure of the proposed method is evaluated on MNIST, Devanagari, Thai, and Lanna Dhamma datasets. The results show that the proposed method outperforms state-of-the-art labeling methods, achieving the highest labeling accuracy. In addition, it can handle outlier samples and deal with alphabets that include visually similar characters. Moreover, the recognition performance of the classifier trained by using the semiautomatically generated training dataset is comparable with that classifier trained by actual ground truth. (c) 2021 Elsevier B.V. All rights reserved.

引用

页数：13

共 25 条

[11] A GA-based feature selection approach with an application to handwritten character recognition
De Stefano, C.
Fontanella, F.
Marrocco, C.
di Freca, A. Scotto
PATTERN RECOGNITION LETTERS, 2014, 35 : 130 - 141
[12] A Semi-automatic Feature Fusion Model for EEG-based Emotion Recognition
Zhang, Gaotian
Li, Shiqian
Wang, Jiabao
Zhou, Yun
Xu, Tao
2021 27TH INTERNATIONAL CONFERENCE ON MECHATRONICS AND MACHINE VISION IN PRACTICE (M2VIP), 2021,
[13] Seal Recognition and Application Based on Multi-feature Fusion Deep Learning
Zhang Z.
Xia S.
Liu Z.
Data Analysis and Knowledge Discovery, 2024, 8 (03) : 143 - 155
[14] LEARNING TO SEGMENT THE LUNG VOLUME FROM CT SCANS BASED ON SEMI-AUTOMATIC GROUND-TRUTH
Sousa, Patrick
Galdran, Adrian
Costa, Pedro
Campilho, Aurelio
2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 2019, : 1202 - 1206
[15] Underground multi-target recognition of ground penetrating radar based on multi-feature information fusion
Zou, Hailin
Liu, Chanjuan
Zhou, Shusen
Zang, Mujun
Metallurgical and Mining Industry, 2015, 7 (07): : 274 - 282
[16] A Semi-Automatic Video Labeling Tool for Autonomous Driving Based on Multi-Object Detector and Tracker
Wang, Ben-Li
King, Chung-Ta
Chu, Hung-Kuo
2018 SIXTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR 2018), 2018, : 201 - 206
[17] A Novel Framework for Automatic Chinese Question Generation Based on Multi-Feature Neural Network Model
Zheng, Hai-Tao
Han, Jinxin
Chen, Jinyuan
Sangaiah, Arun Kumar
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2018, 15 (03) : 487 - 499
[18] Automatic Recognition and Counting Method of Deep-sea Jellyfish Based on Image Multi-feature Matching
Zhang, Junshao
Zhang, Xi
2019 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2019), VOL 1, 2019, : 233 - 236
[19] Application of Multi-Feature Fusion Based on Deep Learning in Pedestrian Re-Recognition Method
Han, Ke
Zhang, Ning
Xie, Haoyang
Wang, Qianlong
MOBILE INFORMATION SYSTEMS, 2022, 2022
[20] Multi-order standard deviation based distance metrics and its application in handwritten Chinese character recognition
Ren junling
18th International Conference on Pattern Recognition, Vol 2, Proceedings, 2006, : 1114 - 1117

← 1 2 3 →