Modeling term proximity for probabilistic information retrieval models

被引:60
|
作者
He, Ben [1 ]
Huang, Jimmy Xiangji [1 ]
Zhou, Xiaofeng [1 ]
机构
[1] York Univ, Sch Informat Technol, Informat Retrieval & Knowledge Management Res Lab, Toronto, ON M3J 2R7, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Proximity; Probabilistic models; BM25; WEB;
D O I
10.1016/j.ins.2011.03.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Proximity among query terms has been found to be useful for improving retrieval performance. However, its application to classical probabilistic information retrieval models, such as Okapi's BM25, remains a challenging research problem. In this paper, we propose to improve the classical BM25 model by utilizing the term proximity evidence. Four novel methods, namely a window-based N-gram Counting method, Survival Analysis over different statistics, including the Poisson process, an exponential distribution and an empirical function, are proposed to model the proximity between query terms. Through extensive experiments on standard TREC collections, our proposed proximity-based BM25 model, called BM25P, is compared to strong state-of-the-art evaluation baselines, including the original unigram BM25 model, the Markov Random Field model, and the positional language model. According to the experimental results, the window-based N-gram Counting method, and Survival Analysis over an exponential distribution are the most effective among all four proposed methods, which lead to marked improvement over the baselines. This shows that the use of term proximity considerably enhances the retrieval effectiveness of the classical probabilistic models. It is therefore recommended to deploy a term proximity component in retrieval systems that employ probabilistic models. Crown Copyright (C) 2011 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:3017 / 3031
页数:15
相关论文
共 50 条
  • [1] Modeling Term Associations for Probabilistic Information Retrieval
    Zhao, Jiashu
    Huang, Jimmy Xiangji
    Ye, Zheng
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2014, 32 (02)
  • [2] PROBABILISTIC MODELS IN INFORMATION-RETRIEVAL
    FUHR, N
    [J]. COMPUTER JOURNAL, 1992, 35 (03): : 243 - 255
  • [3] ON MODELING INFORMATION-RETRIEVAL WITH PROBABILISTIC INFERENCE
    WONG, SKM
    YAO, YY
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 38 - 68
  • [4] On event spaces and probabilistic models in information retrieval
    Robertson, S
    [J]. INFORMATION RETRIEVAL, 2005, 8 (02): : 319 - 329
  • [5] PROBABILISTIC MODELS IN INFORMATION-RETRIEVAL SYSTEMS
    PANYR, J
    [J]. NACHRICHTEN FUR DOKUMENTATION, 1986, 37 (02): : 60 - 66
  • [6] On Event Spaces and Probabilistic Models in Information Retrieval
    Stephen Robertson
    [J]. Information Retrieval, 2005, 8 : 319 - 329
  • [7] Exploring Term Proximity Statistic for Arabic Information Retrieval
    El Mandaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    [J]. 2014 THIRD IEEE INTERNATIONAL COLLOQUIUM IN INFORMATION SCIENCE AND TECHNOLOGY (CIST'14), 2014, : 272 - 277
  • [8] An Enhanced Context-sensitive Proximity Model for Probabilistic Information Retrieval
    Zhao, Jiashu
    Huang, Jimmy Xiangji
    [J]. SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 1131 - 1134
  • [9] Rewarding Term Location Information to Enhance Probabilistic Information Retrieval
    Zhao, Jiashu
    Huang, Jimmy Xiangji
    Wu, Shicheng
    [J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1137 - 1138
  • [10] Using Term Location Information to Enhance Probabilistic Information Retrieval
    Liu, Baiyan
    An, Xiangdong
    Huang, Jimmy Xiangji
    [J]. SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 883 - 886