Improving Information Retrieval Through a Global Term Weighting Scheme

被引:0
|
作者
Cuellar, Daniel [1 ]
Diaz, Elva [1 ]
Ponce-de-Leon-Senti, Eunice [1 ]
机构
[1] UAA, Basic Sci Ctr, Dept Comp Sci, Aguascalientes 20131, Aguascalientes, Mexico
来源
关键词
Information retrieval; Indexing; Vector space model; Term weighting; Marginal distribution; Weighting scheme;
D O I
10.1007/978-3-319-19264-2_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The output of an information retrieval system is an ordered list of documents corresponding to the user query, represented by an input list of terms. This output relies on the estimated similarity between each document and the query. This similarity depends in turn on the weighting scheme used for the terms of the document index. Term weighting then plays a big role in the estimation of the aforementioned similarity. This paper proposes a new term weighting approach for information retrieval based on the marginal frequencies. Consisting of the global count of term frequencies over the corpus of documents, while conventional term weighting schemes such as the normalized term frequency takes into account the term frequencies for particular documents. The presented experiment shows the advantages and disadvantages of the proposed retrieval scheme. Performance measures such as precision and recall and F-Score are used over classical benchmarks such as CACM to validate the experimental results.
引用
收藏
页码:246 / 257
页数:12
相关论文
共 50 条
  • [31] Improving retrieval performance by long-term relevance information
    Yin, PY
    Bhanu, B
    Chang, KC
    Dong, A
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 533 - 536
  • [32] Improving information retrieval in MEDLINE by modulating MeSH term weights
    Shin, K
    Han, SY
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2004, 3136 : 388 - 394
  • [33] A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
    İlker Kocabaş
    Bekir Taner Dinçer
    Bahar Karaoğlan
    Information Retrieval, 2014, 17 : 153 - 176
  • [34] A nonparametric term weighting method for information retrieval based on measuring the divergence from independence
    Kocabas, Ilker
    Dincer, Bekir Taner
    Karaoglan, Bahar
    INFORMATION RETRIEVAL, 2014, 17 (02): : 153 - 176
  • [35] An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions
    Cummins, Ronan
    O'Riordan, Colm
    ARTIFICIAL INTELLIGENCE REVIEW, 2007, 28 (01) : 51 - 68
  • [36] How sensitive are the term-weighting models of information retrieval to spam Web pages?
    Arslan, Ahmet
    INFORMATION PROCESSING LETTERS, 2019, 144 : 16 - 24
  • [37] Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections
    Ronan Cummins
    Colm O’riordan
    Artificial Intelligence Review, 2005, 24 : 277 - 299
  • [38] Term-Weighting in Information Retrieval using Genetic Programming: A three stage process
    Cummins, Ronan
    O'Riordan, Colm
    ECAI 2006, PROCEEDINGS, 2006, 141 : 793 - 794
  • [39] A Comparison of Recent Information Retrieval Term-Weighting Models Using Ancient Datasets
    Alkilinc, Ahmet
    Arslan, Ahmet
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [40] An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions
    Ronan Cummins
    Colm O’Riordan
    Artificial Intelligence Review, 2007, 28 : 51 - 68