UJM at INEX 2007: Document Model Integrating XML Tags

被引:0
|
作者
Gery, Mathias [1 ]
Largeron, Christine [1 ]
Thollard, Franck [1 ]
机构
[1] Univ St Etienne, Hubert Curien Lab, St Etienne, France
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Different approaches have been used to represent textual documents, based on boolean model, vector space model or probabilistic models. In text mining as in information retrieval (IR), these models have shown good results about textual documents modeling. They nevertheless do not take into account documents structure. In many applications however, documents are inherently structured (e.g. XML documents). In this article(1), we propose an extended probabilistic representation of documents in order to take into account a certain kind of structural information: logical tags that represent the different parts of the document and formatting tags used to emphasized text. Our approach includes a learning step that estimates the weight of each tag. This weight is related to the probability for a given tag to distinguish the relevant terms.
引用
收藏
页码:103 / 114
页数:12
相关论文
共 44 条
  • [1] UJM at INEX 2009 XML Mining Track
    Largeron, Christine
    Moulin, Christophe
    Gery, Mathias
    [J]. FOCUSED RETRIEVAL AND EVALUATION, 2010, 6203 : 426 - +
  • [2] UJM at INEX 2008 XML Mining Track
    Gery, Mathias
    Largeron, Christine
    Moulin, Christophe
    [J]. ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 446 - +
  • [3] UJM at INEX 2008: Pre-impacting of Tags Weights
    Gery, Mathias
    Largeron, Christine
    Thollard, Franck
    [J]. ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 46 - +
  • [4] XML structure mapping - Application to the PASCAL/INEX 2006 XML document mining track
    Maes, Francis
    Denoyer, Ludovic
    Gallinari, Patrick
    [J]. COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 540 - 551
  • [5] Analyzing the properties of XML fragments decomposed from the INEX document collection
    Hatano, K
    Kinutani, H
    Amagasa, T
    Mori, Y
    Yoshikawa, M
    Uemura, S
    [J]. ADVANCES IN XML INFORMATION RETRIEVAL, 2005, 3493 : 168 - 182
  • [6] Integrating document and data retrieval based on XML
    Jan-Marco Bremer
    Michael Gertz
    [J]. The VLDB Journal, 2006, 15 : 53 - 83
  • [7] Integrating document and data retrieval based on XML
    Bremer, JM
    Gertz, M
    [J]. VLDB JOURNAL, 2006, 15 (01): : 53 - 83
  • [8] An XML Document Warehouse model
    Nassis, Vicky
    Dillon, Tharam S.
    Rajagopalapillai, Rajugan
    Rahayu, Wenny
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2006, 3882 : 513 - 529
  • [9] Mapping Bitemporal XML Data Model to XML Document
    Tang, Na
    Tang, Yong
    [J]. COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN IV, 2008, 5236 : 342 - 352
  • [10] Integrating text retrieval and image retrieval in XML document searching
    Tjondronegoro, D.
    Zhang, J.
    Gu, J.
    Nguyen, A.
    Geva, S.
    [J]. ADVANCES IN XML INFORMATION RETRIEVAL AND EVALUATION, 2006, 3977 : 511 - 524