Robust Unsupervised Segmentation of Degraded Document Images with Topic Models

被引:0
|
作者
Burns, Timothy J. [1 ]
Corso, Jason J. [1 ]
机构
[1] SUNY Buffalo, Buffalo, NY 14260 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Segmentation of document images remains a challenging vision problem. Although document images have a structured layout, capturing enough of it for segmentation can be difficult. Most current methods combine text extraction and heuristics for segmentation, but text extraction is prone to failure and measuring accuracy remains a difficult challenge. Furthermore, when presented with significant degradation many common heuristic methods fall apart. In this paper, we propose a Bayesian generative model for document images which seeks to overcome some of these drawbacks. Our model automatically discovers different regions present in a document image in a completely unsupervised fashion. We attempt no text extraction, but rather use discrete patch-based codebook learning to make our probabilistic representation feasible. Each latent region topic is a distribution over these patch indices. We capture rough document layout with an M R F Potts model. We take an analysis-by-synthesis approach to examine the model, and provide quantitative segmentation results on a manually-labeled document image data set. We illustrate our model's robustness by providing results on a highly degraded version of our test set.
引用
收藏
页码:1287 / 1294
页数:8
相关论文
共 50 条
  • [21] Broken and degraded document images binarization
    Chen, Yiping
    Wang, Liansheng
    NEUROCOMPUTING, 2017, 237 : 272 - 280
  • [22] Improving the quality of degraded document images
    Kavallieratou, Ergina
    Stamatatos, Efstathios
    SECOND INTERNATIONAL CONFERENCE ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES, PROCEEDINGS, 2006, : 340 - +
  • [23] Robust Character Segmentation and Recognition Schemes for Multilingual Indian Document Images
    Sahare, Parul
    Dhok, Sanjay B.
    IETE TECHNICAL REVIEW, 2019, 36 (02) : 209 - 222
  • [24] UNSUPERVISED TOPIC MODEL FOR BROADCAST PROGRAM SEGMENTATION
    Boulianne, Gilles
    Dumouchel, Pierre
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8455 - 8459
  • [25] Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space
    Park, Seongmin
    Seo, Jinkyu
    Lee, Jihwa
    INTERSPEECH 2023, 2023, : 730 - 734
  • [26] Unsupervised segmentation of road images
    Rouquet, C
    Bonton, P
    ROAD VEHICLE AUTOMATION II: TOWARDS SYSTEMS INTEGRATION, 1997, : 346 - 352
  • [27] Unsupervised segmentation of hyperspectral images
    Lee, Sangwook
    Lee, Chulhee
    SATELLITE DATA COMPRESSION, COMMUNICATION, AND PROCESSING IV, 2008, 7084
  • [28] Unsupervised segmentation of color images
    Guo, G
    Yu, S
    Ma, SD
    1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 3, 1998, : 299 - 302
  • [29] Page Segmentation for Historical Document Images Based on Superpixel Classification with Unsupervised Feature Learning
    Chen, Kai
    Liu, Cheng-Lin
    Seuret, Mathias
    Liwicki, Marcus
    Hennebert, Jean
    Ingold, Rolf
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 299 - 304
  • [30] Unsupervised segmentation based on robust estimation and color active contour models
    Yang, L
    Meer, P
    Foran, DJ
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2005, 9 (03): : 475 - 486