A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

被引:0
|
作者
İlker Kocabaş
Bekir Taner Dinçer
Bahar Karaoğlan
机构
[1] Ege University,International Computer Institute
[2] Muğla University,Department of Statistics
[3] Muğla University,Department of Computer Engineering
来源
Information Retrieval | 2014年 / 17卷
关键词
Information retrieval; Nonparametric index term weighting; Statistical dependence; Pearson’s Chi-Square statistics;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we introduce an out-of-the-box automatic term weighting method for information retrieval. The method is based on measuring the degree of divergence from independence of terms from documents in terms of their frequency of occurrence. Divergence from independence has a well-establish underling statistical theory. It provides a plain, mathematically tractable, and nonparametric way of term weighting, and even more it requires no term frequency normalization. Besides its sound theoretical background, the results of the experiments performed on TREC test collections show that its performance is comparable to that of the state-of-the-art term weighting methods in general. It is a simple but powerful baseline alternative to the state-of-the-art methods with its theoretical and practical aspects.
引用
收藏
页码:153 / 176
页数:23
相关论文
共 50 条
  • [21] A Part-Of-Speech term weighting scheme for biomedical information retrieval
    Wang, Yanshan
    Wu, Stephen
    Li, Dingcheng
    Mehrabi, Saeed
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 63 : 379 - 389
  • [22] Effects of central tendency measures on term weighting in textual information retrieval
    Farzad Ghahramani
    Hooman Tahayori
    Andrea Visconti
    Soft Computing, 2021, 25 : 7341 - 7378
  • [23] A probabilistic justification for using tf × idf term weighting in information retrieval
    Hiemstra D.
    International Journal on Digital Libraries, 2000, 3 (2) : 131 - 139
  • [24] Effects of central tendency measures on term weighting in textual information retrieval
    Ghahramani, Farzad
    Tahayori, Hooman
    Visconti, Andrea
    SOFT COMPUTING, 2021, 25 (11) : 7341 - 7378
  • [25] A selective approach to index term weighting for robust information retrieval based on the frequency distributions of query terms
    Ahmet Arslan
    Bekir Taner Dinçer
    Information Retrieval Journal, 2019, 22 : 543 - 569
  • [26] A selective approach to index term weighting for robust information retrieval based on the frequency distributions of query terms
    Arslan, Ahmet
    Dincer, Bekir Taner
    INFORMATION RETRIEVAL JOURNAL, 2019, 22 (06): : 543 - 569
  • [27] Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space
    Cummins, Ronan
    O'Riordan, Colm
    ARTIFICIAL INTELLIGENCE REVIEW, 2006, 26 (1-2) : 35 - 47
  • [28] Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space
    Ronan Cummins
    Colm O’Riordan
    Artificial Intelligence Review, 2006, 26 : 35 - 47
  • [29] Measuring the Performance of Ontological Based Information Retrieval from a Social Media
    Sediyono, Eko
    Suhartono
    Nivak, Christian
    UKSIM-AMSS EIGHTH EUROPEAN MODELLING SYMPOSIUM ON COMPUTER MODELLING AND SIMULATION (EMS 2014), 2014, : 354 - 359
  • [30] Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval
    Zhang, Hui
    Wang, Deqing
    Wu, Wenjun
    Hu, Hongping
    ENTERPRISE INFORMATION SYSTEMS, 2012, 6 (04) : 433 - 444