Min-cut segmentation of cursive handwriting in tabular documents

被引:0
|
作者
Davis, Brian L. [1 ]
Barrett, William A. [1 ]
Swingle, Scott D. [1 ]
机构
[1] Brigham Young Univ, Dept Comp Sci, Provo, UT 84602 USA
来源
关键词
handwriting segmentation; tabular; cursive handwriting; line segmentation; word segmentation; graph cut; min cut; max flow; overlap; 3D;
D O I
10.1117/12.2076228
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Handwritten tabular documents, such as census, birth, death and marriage records, contain a wealth of information vital to genealogical and related research. Much work has been done in segmenting freeform handwriting, however, segmentation of cursive handwriting in tabular documents is still an unsolved problem. Tabular documents present unique segmentation challenges caused by handwriting overlapping cell-boundaries and other words, both horizontally and vertically, as "ascenders" and "descenders" overlap into adjacent cells. This paper presents a method for segmenting handwriting in tabular documents using a min-cut/max-flow algorithm on a graph formed from a distance map and connected components of handwriting. Specifically, we focus on line, word and first letter segmentation. Additionally, we include the angles of strokes of the handwriting as a third dimension to our graph to enable the resulting segments to share pixels of overlapping letters. Word segmentation accuracy is 89.5% evaluating lines of the data set used in the ICDAR2013 Handwriting Segmentation Contest. Accuracy is 92.6% for a specific application of segmenting first and last names from noisy census records. Accuracy for segmenting lines of names from from noisy census records is 80.7%. The 3D graph cutting shows promise in segmenting overlapping letters, although highly convoluted or overlapping handwriting remains an ongoing challenge.
引用
收藏
页数:12
相关论文
共 50 条