A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

被引:0
|
作者
İlker Kocabaş
Bekir Taner Dinçer
Bahar Karaoğlan
机构
[1] Ege University,International Computer Institute
[2] Muğla University,Department of Statistics
[3] Muğla University,Department of Computer Engineering
来源
Information Retrieval | 2014年 / 17卷
关键词
Information retrieval; Nonparametric index term weighting; Statistical dependence; Pearson’s Chi-Square statistics;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we introduce an out-of-the-box automatic term weighting method for information retrieval. The method is based on measuring the degree of divergence from independence of terms from documents in terms of their frequency of occurrence. Divergence from independence has a well-establish underling statistical theory. It provides a plain, mathematically tractable, and nonparametric way of term weighting, and even more it requires no term frequency normalization. Besides its sound theoretical background, the results of the experiments performed on TREC test collections show that its performance is comparable to that of the state-of-the-art term weighting methods in general. It is a simple but powerful baseline alternative to the state-of-the-art methods with its theoretical and practical aspects.
引用
收藏
页码:153 / 176
页数:23
相关论文
共 50 条
  • [31] Semi-parametric and Non-parametric Term Weighting for Information Retrieval
    Metzler, Donald
    Zaragoza, Hugo
    ADVANCES IN INFORMATION RETRIEVAL THEORY, 2009, 5766 : 42 - 53
  • [32] A competitive term selection method for information retrieval
    Lopez, Franco Rojas
    Jimenez-Salazar, Hector
    Pinto, David
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 468 - +
  • [33] Image Retrieval Method Based on Semantic Concepts and Feature Weighting
    Zhao, Jiandong
    Kui, Lu
    Xin, Jin
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING APPLICATIONS (CSEA 2015), 2015, : 299 - 303
  • [34] A novel term weighting scheme based on discrimination power obtained from past retrieval results
    Song, Sa-kwang
    Myaeng, Sung Hyon
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (05) : 921 - 932
  • [35] A image retrieval method using TFIDF based weighting scheme
    Suzuki, Yu
    Mitsukawa, Masahiro
    Kawagoe, Kyoji
    DEXA 2008: 19TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, : 112 - +
  • [36] An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions
    Cummins, Ronan
    O'Riordan, Colm
    ARTIFICIAL INTELLIGENCE REVIEW, 2007, 28 (01) : 51 - 68
  • [37] How sensitive are the term-weighting models of information retrieval to spam Web pages?
    Arslan, Ahmet
    INFORMATION PROCESSING LETTERS, 2019, 144 : 16 - 24
  • [38] Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections
    Ronan Cummins
    Colm O’riordan
    Artificial Intelligence Review, 2005, 24 : 277 - 299
  • [39] Term-Weighting in Information Retrieval using Genetic Programming: A three stage process
    Cummins, Ronan
    O'Riordan, Colm
    ECAI 2006, PROCEEDINGS, 2006, 141 : 793 - 794
  • [40] A Comparison of Recent Information Retrieval Term-Weighting Models Using Ancient Datasets
    Alkilinc, Ahmet
    Arslan, Ahmet
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,