A document model based on relevance modeling techniques for semi-structured information warehouses

被引:0
|
作者
Pérez, JM [1 ]
Berlanga, R [1 ]
Aramburu, MJ [1 ]
机构
[1] Univ Jaume 1, Castellon de La Plana, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
During the last decade, data warehouse and OLAP techniques have helped companies to gather, organize and analyze the structured data they produce. Simultaneously, digital libraries have applied Information Retrieval mechanisms to query their repositories of unstructured text-rich documents. In this paper we explain how XML allows for the convergence of these two approaches, making possible the development of warehouses for semi-structured information. So far, the proposals of extending data warehouse technology to manage semi-structured information have not been able to exploit the textual contents, mainly because they are not based on a proper document model. In our opinion, such a model must integrate IR and OLAP techniques. In this paper we present a set of requirements for semi-structured information warehouses, as well as a document model to support their construction. In this model, new Relevance Modeling mechanisms are used for ranking the facts described in the text of the documents according to their relevance to an IR-OLAP query. Preliminary evaluations show the usefulness of the document model.
引用
收藏
页码:318 / 327
页数:10
相关论文
共 50 条
  • [1] SEMI-STRUCTURED DOCUMENT EXTRACTION BASED ON DOCUMENT ELEMENT BLOCK MODEL
    Lv, Tao
    Liu, Jiang
    Lu, Fan
    Zhang, Peng
    Wang, Xinyan
    Wang, Cong
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 461 - 465
  • [2] A semi-structured document model for text mining
    Yang, JW
    Chen, XO
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2002, 17 (05) : 603 - 610
  • [3] A semi-structured document model for text mining
    Jianwu Yang
    Xiaoou Chen
    Journal of Computer Science and Technology, 2002, 17 : 603 - 610
  • [4] Exploiting structural information for semi-structured document categorization
    Bratko, A
    Filipic, B
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (03) : 679 - 694
  • [5] Bayesian network model for semi-structured document classification
    Denoyer, L
    Gallinari, P
    INFORMATION PROCESSING & MANAGEMENT, 2004, 40 (05) : 807 - 827
  • [6] Spatial Dependency Parsing for Semi-Structured Document Information Extraction
    Hwang, Wonseok
    Yim, Jinyeong
    Park, Seunghyun
    Yang, Sohee
    Seo, Minjoon
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 330 - 343
  • [7] A storage and retrieval model based on XML for semi-structured information
    Gao, L
    Chen, HP
    Gu, JG
    Wang, JC
    Fang, HP
    Li, XH
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 33 - 38
  • [9] Recognition techniques for extracting information from semi-structured documents
    Della Ventura, A
    Gagliardi, I
    Zonta, B
    DOCUMENT RECOGNITION AND RETRIEVAL VIII, 2001, 4307 : 130 - 137
  • [10] Multimedia retrieval based on geometric distance in semi-structured document
    Fakhfakh, Sana
    Tmar, Mohamed
    Mahdi, Walid
    WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies, 2014, 1 : 220 - 225