Adaptive document block segmentation and classification

被引:26
|
作者
Shih, FY
Chen, SS
机构
[1] Computer Vision Laboratory, Department of Computer and Information Science, New Jersey Institule of Technology, Newark
关键词
D O I
10.1109/3477.537322
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This correspondence presents an adaptive block segmentation and classification technique for daily-received office documents having complex layout structures such as multiple columns and mixed-mode contents of text, graphics, and pictures. First, an improved two-step block segmentation algorithm is performed based on run-length smoothing for decomposing any document into single-mode blocks. Then, a rule-based block classification is used for classifying each block into the test, horizontal/vertical line, graphics, or picture type. The document features and rules used are independent of character font and size and the scanning resolution. Experimental results show that our algorithms are capable of correctly segmenting and classifying different types of mixed-mode printed documents.
引用
收藏
页码:797 / 802
页数:6
相关论文
共 50 条
  • [41] A new video watermarking algorithm based on shot segmentation and block classification
    Jiang Xuemei
    Liu Quan
    Wu Qiaoyan
    Multimedia Tools and Applications, 2013, 62 : 545 - 560
  • [42] Hierarchical-Document-Structure-Aware Attention with Adaptive Cost Sensitive Learning for Biomedical Document Classification
    Fang, Dandan
    Zhang, Jinyong
    Zhao, Weizhong
    Xu, Xiaowei
    Jiang, Xingpeng
    Hu, Xiaohua
    He, Tingting
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1096 - 1102
  • [43] Adaptive postprocessing algorithm in block-coded images using block classification and MLP
    Kwon, KK
    Kim, BJ
    Lee, SH
    Kwon, SG
    Lee, KI
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2003, E86A (04) : 961 - 967
  • [44] An Ontology Driven Knowledge Block Summarization Approach for Chinese Judgment Document Classification
    Ma, Yinglong
    Zhang, Peng
    Ma, Jiangang
    IEEE ACCESS, 2018, 6 : 71327 - 71338
  • [45] A novel OCR approach based on document layout analysis and text block classification
    Zhu, Weiheng
    Liu, Yuanfeng
    Hao, Liang
    PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2016, : 91 - 94
  • [46] Multiscale document segmentation
    Cheng, H
    Bouman, CA
    Allebach, JP
    IS&T 50TH ANNUAL CONFERENCE, FINAL PROGRAM AND PROCEEDINGS, 1997, : 417 - 425
  • [47] SEGMENTATION OF DOCUMENT IMAGES
    TAXT, T
    FLYNN, PJ
    JAIN, AK
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (12) : 1322 - 1329
  • [48] SEGMENTATION OF DOCUMENT IMAGES
    TAXT, T
    FLYNN, PJ
    JAIN, AK
    1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 1062 - 1067
  • [49] AUTOMATED DOCUMENT SEGMENTATION
    ZLATOPOLSKY, AA
    PATTERN RECOGNITION LETTERS, 1994, 15 (07) : 699 - 704
  • [50] Page Segmentation for Historical Document Images Based on Superpixel Classification with Unsupervised Feature Learning
    Chen, Kai
    Liu, Cheng-Lin
    Seuret, Mathias
    Liwicki, Marcus
    Hennebert, Jean
    Ingold, Rolf
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 299 - 304