A general language model for information retrieval

被引:154
|
作者
Song, F [1 ]
Croft, WB [1 ]
机构
[1] Univ Guelph, Dept Comp & Informat Sci, Guelph, ON N1G 2W1, Canada
关键词
statistical language modeling; Good-Turing estimate; curve-fitting functions; model combinations;
D O I
10.1145/319950.320022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Statistical language modeling has been successfully used for speech recognition, part-of-speech tagging, and syntactic parsing. Recently, it has also been applied to information retrieval. According to this new paradigm, each document is viewed as a language sample, and a query as a generation process. The retrieved documents are ranked based on the probabilities of producing a query from the corresponding language models of these documents. In this paper, we will present a new language model for information retrieval, which is based on a range of data smoothing techniques, including the Good-Turing estimate, curve-fitting functions, and model combinations. Our model is conceptually simple and intuitive, and can be easily extended to incorporate probabilities of phrases such as word pairs and word triples. The experiments with the Wall Street Journal and TREC4 data sets showed that the performance of our model is comparable to that of INQUERY and better than that of another language model for information retrieval. In particular, word pairs are shown to be useful in improving the retrieval performance.
引用
收藏
页码:316 / 321
页数:6
相关论文
共 50 条
  • [1] A general language model for information retrieval
    Song, F
    Croft, WB
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 279 - 280
  • [2] A Proximity Language Model for Information Retrieval
    Zhao, Jinglei
    Yun, Yeogirl
    [J]. PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 291 - 298
  • [3] A novel dependency language model for information retrieval
    Ke-ke Cai
    Jia-jun Bu
    Chun Chen
    Guang Qiu
    [J]. Journal of Zhejiang University-SCIENCE A, 2007, 8 : 871 - 882
  • [5] Word sense language model for information retrieval
    Gao, Liqi
    Zhang, Yu
    Liu, Ting
    Liu, Guiping
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 158 - 171
  • [6] LSM: Language sense model for information retrieval
    Bao, Shenghua
    Zhang, Lei
    Chen, Erdong
    Long, Min
    Li, Rui
    Yu, Yong
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 97 - 108
  • [7] A novel dependency language model for information retrieval
    Cai Ke-ke
    Bu Jia-jun
    Chen Chun
    Qiu Guang
    [J]. JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE A, 2007, 8 (06): : 871 - 882
  • [8] Personalization Information Retrieval Based on Unigram Language Model
    Yu Yangxin
    [J]. MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 2269 - 2273
  • [9] Large Language Model Powered Agents for Information Retrieval
    Zhang, An
    Deng, Yang
    Lin, Yankai
    Chen, Xu
    Wen, Ji-Rong
    Chua, Tat-Seng
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2989 - 2992
  • [10] Language Model Adaptation for Relevance Feedback in Information Retrieval
    Chang, Ying-Lang
    Chien, Jen-Tzung
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 289 - 292