A document model based on relevance modeling techniques for semi-structured information warehouses

被引:0
|
作者
Pérez, JM [1 ]
Berlanga, R [1 ]
Aramburu, MJ [1 ]
机构
[1] Univ Jaume 1, Castellon de La Plana, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
During the last decade, data warehouse and OLAP techniques have helped companies to gather, organize and analyze the structured data they produce. Simultaneously, digital libraries have applied Information Retrieval mechanisms to query their repositories of unstructured text-rich documents. In this paper we explain how XML allows for the convergence of these two approaches, making possible the development of warehouses for semi-structured information. So far, the proposals of extending data warehouse technology to manage semi-structured information have not been able to exploit the textual contents, mainly because they are not based on a proper document model. In our opinion, such a model must integrate IR and OLAP techniques. In this paper we present a set of requirements for semi-structured information warehouses, as well as a document model to support their construction. In this model, new Relevance Modeling mechanisms are used for ranking the facts described in the text of the documents according to their relevance to an IR-OLAP query. Preliminary evaluations show the usefulness of the document model.
引用
收藏
页码:318 / 327
页数:10
相关论文
共 50 条
  • [41] A knowledge-based information extraction system for semi-structured labeled documents
    Yang, JY
    Oh, H
    Doh, KG
    Choi, J
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 105 - 110
  • [42] An Algorithm of Semi-structured Data Scheme Extraction Based on OEM Model
    Gong, An
    Yang, Xue-wei
    ADVANCED RESEARCH ON ELECTRONIC COMMERCE, WEB APPLICATION, AND COMMUNICATION, PT 1, 2011, 143 : 315 - 319
  • [43] A framework for automatic reconstruction of a semi-structured rationale from a minutes document
    Onditi, V.
    Sommerville, I.
    2007 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-4, 2007, : 1726 - 1731
  • [44] Extract list data from semi-structured document using clustering
    Xu, H
    Li, JZ
    Xu, P
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 559 - 564
  • [45] A Wavelet Transform Based Structural Similarity Model for Semi-structured Texts
    Su, Jie
    Bao, Junpeng
    KNOWLEDGE DISCOVERY AND DATA MINING, 2012, 135 : 159 - 167
  • [46] Building Wikipedia Ontology with More Semi-structured Information Resources
    Kawakami, Tokio
    Morita, Takeshi
    Yamaguchi, Takahira
    SEMANTIC TECHNOLOGY, JIST 2017, 2017, 10675 : 3 - 18
  • [47] Learning information extraction rules for semi-structured and free text
    Soderland, S
    MACHINE LEARNING, 1999, 34 (1-3) : 233 - 272
  • [48] Unsupervised Extraction of Product Information from Semi-structured Sources
    Walther, Maximilian
    13TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI 2012), 2012, : 257 - 262
  • [49] When Conceptual Model Meets Grammar: A Formal Approach to Semi-structured Data Modeling
    Necasky, Martin
    Mlynkova, Irena
    WEB INFORMATION SYSTEM ENGINEERING-WISE 2010, 2010, 6488 : 279 - 293
  • [50] Supplementing domain knowledge to BERT with semi-structured information of documents
    Chen, Jing
    Wei, Zhihua
    Wang, Jiaqi
    Wang, Rui
    Gong, Chuanyang
    Zhang, Hongyun
    Miao, Duoqian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235