On the information content of semi-structured databases

被引:0
|
作者
Levene, Mark [1 ]
机构
[1] Department of Computer Science, University College London, Gower Street, London WC1E 6BT, United Kingdom
来源
Acta Cybernetica | 1998年 / 13卷 / 03期
关键词
D O I
暂无
中图分类号
学科分类号
摘要
In a semi-structured database there is no clear separation between the data and the schema, and the degree to which it is structured depends on the application. Semi-structured data is naturally modelled in terms of graphs which contain labels which give semantics to its underlying structure. Such databases subsume the modelling power of recent extensions of flat relational databases, to nested databases which allow the nesting (or encapsulation) of entities, and to object databases which, in addition, allow cyclic references between objects. Due to the flexibility of data modelling in a semi-structured environment, in any given application there may be different ways in which to enter the data, but it is not always clear when the semantics are the same. In order to compare different approaches to modelling the data we investigate a measure of the information content of typical semi-structured databases in order to test whether such databases are information-wise equivalent. For the purpose of our investigation we use a graph-based data model, called the hypernode model, as our model for semi-structured data and formalise flat, nested and object databases as subclasses of hypernode databases. We use formal language theory to define the context-free grammar induced by a hypernode database, and then formalise the information content of such a database as the language generated by this context-free grammar. Intuitively, the information content of a database provides us with a measure of how flexible the database is in modelling the information from different points of view. This enables us to prove the following results regarding the expressive power of databases: (1) in general, hypernode databases and thus semi-structured databases express the general class of context-free languages, (2) the class of flat databases expresses the class of finite languages whose words are of restricted length between one and four, (3) the class of nested databases expresses the class of finite languages, and (4) the class of object databases expresses the general class of regular languages. We then define two hypernode databases to be information-wise equivalent if they generate the same context-free language. This allows us to prove the following results regarding the computational complexity of determining whether two databases are information-wise equivalent or inequivalent: (1) the problem of determining information-wise equivalence of hypernode databases and thus semi-structured databases is, in general, undecidable, (2) the problem of determining information-wise equivalence of flat databases can be solved in time polynomial in the size of the two databases, (3) the problem of determining information-wise inequivalence of nested databases is NP-complete, and (4) the problem of determining information-wise inequivalence of object databases is PSPACE-complete.
引用
收藏
页码:257 / 275
相关论文
共 50 条
  • [21] Chinese resume information extraction based on semi-structured text
    Wentan, Yan
    Yupeng, Qiao
    Chinese Control Conference, CCC, 2017, : 11177 - 11182
  • [22] Supplementing domain knowledge to BERT with semi-structured information of documents
    Chen, Jing
    Wei, Zhihua
    Wang, Jiaqi
    Wang, Rui
    Gong, Chuanyang
    Zhang, Hongyun
    Miao, Duoqian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235
  • [23] Bootstrapping Information Extraction from Semi-structured Web Pages
    Carlson, Andrew
    Schafer, Charles
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART I, PROCEEDINGS, 2008, 5211 : 195 - +
  • [24] An approach to semantic information retrieval in heterogeneous semi-structured documents
    Mrabet, Yassine
    Bennacer, Nacéra
    Pernelle, Nathalie
    Thiam, Mouhamadou
    CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 195 - 210
  • [25] Spatial Dependency Parsing for Semi-Structured Document Information Extraction
    Hwang, Wonseok
    Yim, Jinyeong
    Park, Seunghyun
    Yang, Sohee
    Seo, Minjoon
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 330 - 343
  • [26] INFOSYNC: Information Synchronization across Multilingual Semi-structured Tables
    Khincha, Siddharth
    Jain, Chelsi
    Gupta, Vivek
    Kataria, Tushar
    Zhang, Shuo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 2536 - 2559
  • [27] Recognition techniques for extracting information from semi-structured documents
    Della Ventura, A
    Gagliardi, I
    Zonta, B
    DOCUMENT RECOGNITION AND RETRIEVAL VIII, 2001, 4307 : 130 - 137
  • [28] Chinese resume information extraction based on semi-structured text
    Yan Wentan
    Qiao Yupeng
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 11177 - 11182
  • [29] Information and Analytical Support of the Authorities Using Semi-Structured Data
    Mikhaylova, Ekaterina
    Mityagin, Sergey
    Tikhonova, Olga
    Zakharov, Yuriy
    9TH INTERNATIONAL CONFERENCE ON THEORY AND PRACTICE OF ELECTRONIC GOVERNANCE (ICEGOV 2016), 2016, : 356 - 357
  • [30] A semi-structured information semantic annotation method for Web pages
    Zhang, Lu
    Wang, Tiantian
    Liu, Yiran
    Duan, Qingling
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6491 - 6501