Using definite clause grammars to build a global system for analyzing collections of documents

被引:0
|
作者
Chazalon, Joseph [1 ]
Coueasnon, Bertrand [1 ]
机构
[1] INSA Rennes, F-35043 Rennes, France
来源
关键词
document collections; historical documents; batch processing; system design; system generation; data flow; structural recognition; attribute grammars; definite clause grammars; RECOGNITION;
D O I
10.1117/12.840436
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Collections of documents are sets of heterogeneous documents, like a specific ancient book series, having proper structural and semantic properties linking them. A particular collection contains document images with specific physical layouts, like text pages or full-page illustrations, appearing in a specific order. Its contents, like journal articles, may be shared by several pages, not necessary following, producing strong dependencies between pages interpretations. In order to build an analysis system which can bring contextual information from the collection to the appropriate recognition modules for each page, we propose to express the structural and the semantic properties of a collection with a definite clause grammar. This is made possible by representing collections as streams of document images, and by using extensions to the formalism we present here. We are then able to automatically generate a parser dedicated to a collection. Beside allowing structural variations and complex information flows, we also show that this approach enables the design of analysis stages, on a document or a set of documents. The interest of context usage is illustrated with several examples and their appropriate formalization in this framework.
引用
收藏
页数:11
相关论文
共 5 条
  • [1] FORMAL SPECIFICATION OF INTERACTIVE LANGUAGES USING DEFINITE CLAUSE GRAMMARS
    DANG, WD
    CRIL, SA
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1989, 348 : 283 - 291
  • [2] Web-based Visualisation for Definite Clause Grammars Using Prolog Meta-Interpreters System Description
    Nogatz, Falco
    Kalkus, Jona
    Seipel, Dietmar
    [J]. PPDP'18: PROCEEDINGS OF THE 20TH INTERNATIONAL SYMPOSIUM ON PRINCIPLES AND PRACTICE OF DECLARATIVE PROGRAMMING, 2018,
  • [3] XML2HBase: Storing and querying large collections of XML documents using a NoSQL database system
    Bao, Liang
    Yang, Jin
    Wu, Chase Q.
    Qi, Haiyang
    Zhang, Xin
    Cai, Shunda
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2022, 161 : 83 - 99
  • [4] Symbols recognition system for graphic documents combining global structural approaches and using a XML representation of data
    Delalandre, M
    Trupin, É
    Ogier, JM
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 425 - 433
  • [5] Analyzing the Seasonal Vertical Displacement Fluctuations Using the Global Navigation Satellite System and Hydrological Load: A Case Study of the Western Yunnan Region
    Xu, Pengfei
    Jiang, Tao
    Li, Wanqiu
    Xu, Gong
    Zhang, Chuanyin
    Wang, Wei
    Tian, Kunjun
    Feng, Jiandi
    [J]. WATER, 2024, 16 (09)