A document model based on relevance modeling techniques for semi-structured information warehouses

被引:0
|
作者
Pérez, JM [1 ]
Berlanga, R [1 ]
Aramburu, MJ [1 ]
机构
[1] Univ Jaume 1, Castellon de La Plana, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
During the last decade, data warehouse and OLAP techniques have helped companies to gather, organize and analyze the structured data they produce. Simultaneously, digital libraries have applied Information Retrieval mechanisms to query their repositories of unstructured text-rich documents. In this paper we explain how XML allows for the convergence of these two approaches, making possible the development of warehouses for semi-structured information. So far, the proposals of extending data warehouse technology to manage semi-structured information have not been able to exploit the textual contents, mainly because they are not based on a proper document model. In our opinion, such a model must integrate IR and OLAP techniques. In this paper we present a set of requirements for semi-structured information warehouses, as well as a document model to support their construction. In this model, new Relevance Modeling mechanisms are used for ranking the facts described in the text of the documents according to their relevance to an IR-OLAP query. Preliminary evaluations show the usefulness of the document model.
引用
收藏
页码:318 / 327
页数:10
相关论文
共 50 条
  • [21] Information Extraction of Strategic Activities based on Semi-structured Text
    Ma, Xubu
    Guo, Ju-E
    Ma, Xubu
    2014 SEVENTH INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION (CSO), 2014, : 579 - 583
  • [22] Lexical semantic SLVM for semi-structured document classification
    Wang, Luda
    Long, Jun
    Li, Zude
    He, Ye
    Journal of Information and Computational Science, 2015, 12 (01): : 307 - 316
  • [23] Knowledge extraction from semi-structured data based on fuzzy techniques
    Ceravolo, P
    Nocerino, MC
    Viviani, M
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2004, 3215 : 328 - 334
  • [24] Managing unstructured and semi-structured information in organisations
    Aitken, Ashley M.
    6th IEEE/ACIS International Conference on Computer and Information Science, Proceedings, 2007, : 712 - 717
  • [25] Low-Dimensionality Information Extraction Model for Semi-structured Documents
    Belhadj, Djedjiga
    Belaid, Abdel
    Belaid, Yolande
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2023, PT I, 2023, 14184 : 76 - 85
  • [26] Low-Dimensionality Information Extraction Model for Semi-structured Documents
    Belhadj, Djedjiga
    Belaïd, Abdel
    Belaïd, Yolande
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 14184 LNCS : 76 - 85
  • [27] Cost-effective End-to-end Information Extraction for Semi-structured Document Images
    Hwang, Wonseok
    Lee, Hyunji
    Yim, Jinyeong
    Kim, Geewook
    Seo, Minjoon
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3375 - 3383
  • [28] Graph-based Retrieval Model for Semi-structured Data
    Park, Juneyoung
    Yi, Mun Y.
    2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 361 - 364
  • [29] AUTOMATIC DETECTION OF REFERENCE ELEMENTS ON SEMI-STRUCTURED DOCUMENT IMAGES
    Lanin, Mikhail
    BIZNES INFORMATIKA-BUSINESS INFORMATICS, 2014, 30 (04): : 17 - 23
  • [30] Learning element similarity matrix for semi-structured document analysis
    Yang, Jianwu
    Cheung, William K.
    Chen, Xiaoou
    KNOWLEDGE AND INFORMATION SYSTEMS, 2009, 19 (01) : 53 - 78