A nonparametric term weighting method for information retrieval based on measuring the divergence from independence

被引:0
|
作者
İlker Kocabaş
Bekir Taner Dinçer
Bahar Karaoğlan
机构
[1] Ege University,International Computer Institute
[2] Muğla University,Department of Statistics
[3] Muğla University,Department of Computer Engineering
来源
Information Retrieval | 2014年 / 17卷
关键词
Information retrieval; Nonparametric index term weighting; Statistical dependence; Pearson’s Chi-Square statistics;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, we introduce an out-of-the-box automatic term weighting method for information retrieval. The method is based on measuring the degree of divergence from independence of terms from documents in terms of their frequency of occurrence. Divergence from independence has a well-establish underling statistical theory. It provides a plain, mathematically tractable, and nonparametric way of term weighting, and even more it requires no term frequency normalization. Besides its sound theoretical background, the results of the experiments performed on TREC test collections show that its performance is comparable to that of the state-of-the-art term weighting methods in general. It is a simple but powerful baseline alternative to the state-of-the-art methods with its theoretical and practical aspects.
引用
收藏
页码:153 / 176
页数:23
相关论文
共 50 条
  • [41] An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions
    Ronan Cummins
    Colm O’Riordan
    Artificial Intelligence Review, 2007, 28 : 51 - 68
  • [42] Evolving general term-weighting schemes for information retrieval: Tests on larger collections
    Cummins, R
    O'riordan, C
    ARTIFICIAL INTELLIGENCE REVIEW, 2005, 24 (3-4) : 277 - 299
  • [43] A Large Scale Document-Term Matching Method Based on Information Retrieval
    Feng, Jinchao
    Zhao, Runbo
    Jiang, Jianguo
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 323 - 330
  • [44] Term Weighting Schemes Experiment Based on SVD for Malay Text Retrieval
    Ab Samat, Nordianah
    Murad, Masrah Azrifah Azmi
    Abdullah, Muhamad Taufik
    Atan, Rodziah
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (10): : 357 - 361
  • [45] Information Gain Based Term Weighting Method for Multi-label Text Classification Task
    Mazyad, Ahmad
    Teytaud, Fabien
    Fonlupt, Cyril
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 607 - 615
  • [46] The Information Retrieval Method Based on the Specificity
    Gong, Yu-Xi
    Zhang, Min-Xia
    Luo, Rong
    FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 595 - 597
  • [47] A proposed method of measuring the utility of individual information retrieval tools
    Meadow, CT
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 1996, 21 (01): : 22 - 34
  • [48] A METHOD FOR MEASURING COHESION OF DESCRIPTORS IN AN INFORMATION-RETRIEVAL THESAURUS
    POKRAS, YL
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1970, (06): : 22 - &
  • [49] Information retrieval-based bug localization approach with adaptive attribute weighting
    ErSahIn, Mustafa
    Utku, Semih
    Kilinc, Deniz
    ErSahIn, Buket
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2021, 29 (03) : 1598 - 1614
  • [50] A Flexible Supervised Term-Weighting Technique and its Application to Variable Extraction and Information Retrieval
    Maisonnave, Mariano
    Delbianco, Fernando
    Tohme, Fernando
    Maguitman, Ana
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2019, 22 (63): : 61 - 80