Structure detection and segmentation of documents using 2D stochastic context-free grammars

被引:6
|
作者
Alvaro, Francisco [1 ]
Cruz, Francisco [2 ]
Sanchez, Joan-Andreu [1 ]
Terrades, Oriol Ramos [2 ]
Benedi, Jose-Miguel [1 ]
机构
[1] Univ Politecn Valencia, Valencia, Spain
[2] Univ Autonoma Barcelona, Ctr Visio Computador, E-08193 Barcelona, Spain
关键词
Document image analysis; Stochastic context-free grammars; Text classification features;
D O I
10.1016/j.neucom.2014.08.076
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we define a bidimensional extension of stochastic context-free grammars for structure detection and segmentation of images of documents. Two sets of text classification features are used to perform an initial classification of each zone of the page. Then, the document segmentation is obtained as the most likely hypothesis according to a stochastic grammar. We used a dataset of historical marriage license books to validate this approach. We also tested several inference algorithms for probabilistic graphical models and the results showed that the proposed grammatical model outperformed the other methods. Furthermore, grammars also provide the document structure along with its segmentation. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:147 / 154
页数:8
相关论文
共 50 条
  • [1] Page Segmentation of Structured Documents Using 2D Stochastic Context-Free Grammars
    Alvaro, Francisco
    Cruz, Francisco
    Sanchez, Joan-Andreu
    Ramos Terrades, Oriol
    Benedi, Jose-Miguel
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, IBPRIA 2013, 2013, 7887 : 133 - 140
  • [2] LANGUAGE MODELING USING STOCHASTIC CONTEXT-FREE GRAMMARS
    CORAZZA, A
    DEMORI, R
    GRETTER, R
    SATTA, G
    [J]. SPEECH COMMUNICATION, 1993, 13 (1-2) : 163 - 170
  • [3] Consistency of stochastic context-free grammars
    Gecse, Roland
    Kovacs, Attila
    [J]. MATHEMATICAL AND COMPUTER MODELLING, 2010, 52 (3-4) : 490 - 500
  • [4] Parallel RNA secondary structure prediction using stochastic context-free grammars
    Liu, T
    Schmidt, B
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2005, 17 (14): : 1669 - 1685
  • [5] Pfold: RNA secondary structure prediction using stochastic context-free grammars
    Knudsen, B
    Hein, J
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3423 - 3428
  • [6] Modeling of bursty channels using stochastic context-free grammars
    Zhu, WL
    Garcia-Frias, J
    [J]. IEEE 55TH VEHICULAR TECHNOLOGY CONFERENCE, VTC SPRING 2002, VOLS 1-4, PROCEEDINGS, 2002, : 355 - 359
  • [7] Document understanding system using stochastic context-free grammars
    Handley, JC
    Namboodiri, AM
    Zanibbi, R
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 511 - 515
  • [8] STATISTICAL ESTIMATION OF STOCHASTIC CONTEXT-FREE GRAMMARS
    CASACUBERTA, F
    [J]. PATTERN RECOGNITION LETTERS, 1995, 16 (06) : 565 - 573
  • [9] Multithreaded comparative RNA secondary structure prediction using stochastic context-free grammars
    Zsuzsanna Sükösd
    Bjarne Knudsen
    Morten Værum
    Jørgen Kjems
    Ebbe S Andersen
    [J]. BMC Bioinformatics, 12
  • [10] Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models
    Alvaro, Francisco
    Sanchez, Joan-Andreu
    Benedi, Jose-Miguel
    [J]. PATTERN RECOGNITION LETTERS, 2014, 35 : 58 - 67