Efficient OCR using simple features and decision trees with backtracking

被引:0
|
作者
Abuhaiba, Ibrahim S. I. [1 ]
机构
[1] Islam Univ Gaza, Dept Elect & Comp Engn, Gaza, Israel
来源
关键词
OCR; normalization; projections; geometrical features; decision tree learning;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper, it is shown that it is adequate to use simple and easy-to-compute features such as those we call sliced horizontal and vertical projections to solve efficiently the OCR problem for machine-printed documents. Recognition is achieved using a decision tree supported with backtracking, smoothing, row and column cropping, and other additions to increase the success rate. Symbols from Times New Roman typeface are used to train our system. Activating backtracking, smoothing, and cropping achieved more than 98% success rate for a recognition time below 30 ms per character. The recognition algorithm was exposed to a hard test by polluting the original dataset with additional artificial noise and could maintain a high success rate and low error rate for highly polluted images, which is a result of backtracking, smoothing, and row and column cropping. Results indicate that we can depend on simple features and hints to reliably recognize characters. The error rate can be decreased by increasing the size of the training dataset. The recognition time can be reduced by using some programming optimization techniques and more powerful computers.
引用
收藏
页码:223 / 243
页数:21
相关论文
共 50 条