A document model based on relevance modeling techniques for semi-structured information warehouses

被引:0
|
作者
Pérez, JM [1 ]
Berlanga, R [1 ]
Aramburu, MJ [1 ]
机构
[1] Univ Jaume 1, Castellon de La Plana, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
During the last decade, data warehouse and OLAP techniques have helped companies to gather, organize and analyze the structured data they produce. Simultaneously, digital libraries have applied Information Retrieval mechanisms to query their repositories of unstructured text-rich documents. In this paper we explain how XML allows for the convergence of these two approaches, making possible the development of warehouses for semi-structured information. So far, the proposals of extending data warehouse technology to manage semi-structured information have not been able to exploit the textual contents, mainly because they are not based on a proper document model. In our opinion, such a model must integrate IR and OLAP techniques. In this paper we present a set of requirements for semi-structured information warehouses, as well as a document model to support their construction. In this model, new Relevance Modeling mechanisms are used for ranking the facts described in the text of the documents according to their relevance to an IR-OLAP query. Preliminary evaluations show the usefulness of the document model.
引用
收藏
页码:318 / 327
页数:10
相关论文
共 50 条
  • [31] Semi-structured Document Annotation Using Entity and Relation Types
    Kundu, Arpita
    Ghosh, Subhasish
    Bhattacharya, Indrajit
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 52 - 68
  • [32] Learning element similarity matrix for semi-structured document analysis
    Jianwu Yang
    William K. Cheung
    Xiaoou Chen
    Knowledge and Information Systems, 2009, 19
  • [33] Integrating a query language for structured and semi-structured data and IR techniques
    Heuer, A
    Priebe, D
    11TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATION, PROCEEDINGS, 2000, : 703 - 707
  • [34] Extracting information from semi-structured Internet sources
    Jeong, JS
    Oh, DI
    ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 1378 - 1381
  • [35] Analyzing semi-structured data for ontological information extraction
    Han, H
    Elmasri, R
    IC'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS I AND II, 2001, : 21 - 27
  • [36] Business information extraction from semi-structured webpages
    Sung, NH
    Chang, YS
    EXPERT SYSTEMS WITH APPLICATIONS, 2004, 26 (04) : 575 - 582
  • [37] Flexible querying of semi-structured information (Invited talk)
    Pasi, G
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 898 - 898
  • [38] Information extraction from semi-structured web documents
    Yun, Bo-Hyun
    Seo, Chang-Ho
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2006, 4092 : 586 - 598
  • [39] Dimensions of ignorance in a semi-structured data model
    Magnani, M
    Montesi, D
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 933 - 937
  • [40] Extracting information from semi-structured internet sources
    Div. of Info. Tech. Eng., College of Engineering, SoonChunHyang University, Asan, Korea, Republic of
    IEEE Int Symp Ind Electron, (1378-1381):