Using proximity and tag weights for focused retrieval in structured documents

被引:2
|
作者
Beigbeder, Michel [1 ]
Gery, Mathias [2 ]
Largeron, Christine [2 ]
机构
[1] Ecole Natl Super Mines, F-42023 St Etienne, France
[2] Univ Lyon, St Etienne, France
关键词
Focused information retrieval; Structured information retrieval; Proximity; XML; Tags; TERM PROXIMITY;
D O I
10.1007/s10115-014-0767-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Focused information retrieval is concerned with the retrieval of small units of information. In this context, the structure of the documents as well as the proximity among query terms have been found useful for improving retrieval effectiveness. In this article, we propose an approach combining the proximity of the terms and the tags which mark these terms. Our approach is based on a Fetch and Browse method where the fetch step is performed with BM25 and the browse step with a structure enhanced proximity model. In this way, the ranking of a document depends not only upon the existence of the query terms within the document but also upon the tags which mark these terms. Thus, the document tends to be highly relevant when query terms are close together and are emphasized by tags. The evaluation of this model on a large XML structured collection provided by the INEX 2010 XML IR evaluation campaign shows that the use of term proximity and structure improves the retrieval effectiveness of BM25 in the context of focused information retrieval.
引用
收藏
页码:51 / 76
页数:26
相关论文
共 50 条
  • [31] A Tag-Like, Linked Navigation Approach for Retrieval and Discovery of Desktop Documents
    Mosweunyane, Gontlafetse
    Carr, Leslie
    Gibbins, Nicholas
    [J]. DIGITAL INFORMATION AND COMMUNICATION TECHNOLOGY AND ITS APPLICATIONS, PT II, 2011, 167 (02): : 692 - +
  • [32] An approach to semantic information retrieval in heterogeneous semi-structured documents
    Mrabet, Yassine
    Bennacer, Nacéra
    Pernelle, Nathalie
    Thiam, Mouhamadou
    [J]. CORIA 2010: Actes de la COnference en Recherche d'Information et Applications - Proceedings of the Conference on Information Retrieval and Applications, 2010, : 195 - 210
  • [33] A model for the representation and focussed retrieval of structured documents based on fuzzy aggregation
    Kazai, G
    Lalmas, M
    Rölleke, T
    [J]. EIGHTH SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2001, : 123 - 135
  • [34] Indexing and retrieval of XML-encoded structured documents in dynamic environment
    Kim, SW
    Lee, J
    Lim, HC
    [J]. ENGINEERING AND DEPLOYMENT OF COOPERATIVE INFORMATION SYSTEMS, PROCEEDINGS, 2002, 2480 : 141 - 154
  • [35] IIRM: Intelligent Information Retrieval Model for Structured Documents by One-Shot Training Using Computer Vision
    Abhijit Guha
    Debabrata Samanta
    SK Hafizul Islam
    [J]. Arabian Journal for Science and Engineering, 2023, 48 : 1285 - 1301
  • [36] IIRM: Intelligent Information Retrieval Model for Structured Documents by One-Shot Training Using Computer Vision
    Guha, Abhijit
    Samanta, Debabrata
    Islam, S. K. Hafizul
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (02) : 1285 - 1301
  • [37] Clustering Documents Using Tagging Communities and Semantic Proximity
    Cunha, Elisabete
    Figueira, Alvaro
    Mealha, Oscar
    [J]. PROCEEDINGS OF THE 2013 8TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI 2013), 2013,
  • [38] Ranked retrieval of structured documents with the S-term vector space model
    Weigel, F
    Schulz, KU
    Meuss, H
    [J]. ADVANCES IN XML INFORMATION RETRIEVAL, 2005, 3493 : 238 - 252
  • [39] How are searching and reading intertwined during retrieval from hierarchically structured documents?
    Hertzum, M
    Lalmas, M
    Frokjær, E
    [J]. HUMAN-COMPUTER INTERACTION - INTERACT'01, 2001, : 537 - 544
  • [40] USING UDC FOR COORDINATE INDEXING AND RETRIEVAL OF DOCUMENTS
    DMITRIEVSKII, NN
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1968, (01): : 14 - +