Read-Write-Learn: Self-Learning for Handwriting Recognition

被引:0
|
作者
Boteanu, Adrian [1 ]
Cheng, Du [1 ]
Kadioglu, Serdar [1 ]
机构
[1] Fidel Investments, Boston, MA 02210 USA
关键词
handwriting recognition; handwriting generation; self-learning;
D O I
10.1145/3573128.3609343
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Handwriting recognition relies on supervised data for training. Annotations typically include both the written text and the author's identity to facilitate the recognition of a particular style. A large annotation set is required for robust recognition, which is not always available in historical texts and low-annotation languages. To mitigate this challenge, we propose the Read-Write-Learn framework. In this setting, we augment the training process of handwriting recognition with a language model and a handwriting generator. Specifically, in the first reading step, we employ a language model to identify text that is likely detected correctly by the recognition model. Then, in the writing step, we generate more training data in the same writing style. Finally, in the learning step, we use the newly generated data in the same writing style to finetune the recognition model. Our Read-Write-Learn framework allows the recognition model to incrementally converge on the new style. Our experiments on historical handwritten documents demonstrate the benefits of the approach, and we present several examples to showcase improved recognition.
引用
收藏
页数:4
相关论文
共 50 条