Japanese historical character recognition by focusing on character parts

被引:0
|
作者
Ishikawa, Takuru [1 ]
Miyazaki, Tomo [1 ]
Omachi, Shinichiro [1 ]
机构
[1] Tohoku Univ, Grad Sch Engn, 6-6-05 Aoba Aramakiaza, Sendai, Miyagi 9808579, Japan
关键词
Historical document analysis; Japanese historical character; Learning character parts; Few-shot; Zero-shot recognition;
D O I
10.1016/j.patcog.2023.110181
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Japanese historical documents provide valuable information. Character recognition is a critical technology for the digitalization of historical documents. Sample imbalance is a significant obstacle in recognizing Japanese historical characters, kuzushiji. Thousands of kuzushiji only have less than a few samples. Thus, recognition performance deteriorates greatly in kuzushiji with a few samples. In this study, we propose a framework for transferring knowledge of character parts from font to kuzushiji. The pretraining learns character parts from synthesized font images. However, fine-tuning to kuzushiji is more complex. We propose calculating a mean squared error loss between feature vectors of kuzushiji and font images, resulting in consistent feature vectors in kuzushiji and font. Consequently, we can perform zero-shot recognition for kuzushiji using the font images of zero-sampled kuzushiji. The experimental results show that the proposed method recognized zero-sampled kuzushiji at approximately 48% accuracy. Consequently, we significantly expand the number of recognizable kuzushiji.
引用
收藏
页数:8
相关论文
共 50 条