Character segmentation using convex-hull techniques

被引:11
|
作者
Chang, TC [1 ]
Chen, SY [1 ]
机构
[1] Yuan Ze Univ, Dept Comp Engn & Sci, Tao Yuan 32026, Taiwan
关键词
character segmentation; touching character; convex hull; typographical relation; technical journal contents;
D O I
10.1142/S021800149900046X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A novel character segmentation method for printed documents is proposed in this paper. It is very difficult to process touching, overlapping and broken characters simultaneously. The strategy of our method is to adjust the binarization parameters such that broken characters can be avoided. On the contrary, adjacent characters may spread into each other seriously. Henceforth, the character segmentation problem can be focused on touching-character detection and separation. In the proposed approach, touching characters can be detected using the topological attributes of characters and the typographical relationship between characters. More specifically, the topological attributes are derived from the spatial organization of concave residua contained in the convex hull enclosing the characters. A shortest-path algorithm together with the convex-hull information is used to separate the composite. Since these features based upon the convex hull are insensitive to character fonts and sizes, the touching-character problem of various fonts and sizes can be managed even for heavily touching characters or italic-type overlapping characters without prior slant correction. The proposed method has been applied to extract isolated characters from the contents of technical journals, which contain characters of various fonts and sizes. The promising experimental results prove the practicality and feasibility of the proposed method.
引用
收藏
页码:833 / 858
页数:26
相关论文
共 50 条