A general language model for information retrieval

被引:154
|
作者
Song, F [1 ]
Croft, WB [1 ]
机构
[1] Univ Guelph, Dept Comp & Informat Sci, Guelph, ON N1G 2W1, Canada
关键词
statistical language modeling; Good-Turing estimate; curve-fitting functions; model combinations;
D O I
10.1145/319950.320022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Statistical language modeling has been successfully used for speech recognition, part-of-speech tagging, and syntactic parsing. Recently, it has also been applied to information retrieval. According to this new paradigm, each document is viewed as a language sample, and a query as a generation process. The retrieved documents are ranked based on the probabilities of producing a query from the corresponding language models of these documents. In this paper, we will present a new language model for information retrieval, which is based on a range of data smoothing techniques, including the Good-Turing estimate, curve-fitting functions, and model combinations. Our model is conceptually simple and intuitive, and can be easily extended to incorporate probabilities of phrases such as word pairs and word triples. The experiments with the Wall Street Journal and TREC4 data sets showed that the performance of our model is comparable to that of INQUERY and better than that of another language model for information retrieval. In particular, word pairs are shown to be useful in improving the retrieval performance.
引用
收藏
页码:316 / 321
页数:6
相关论文
共 50 条
  • [31] Information retrieval with language knowledge
    Dura, E
    Drejak, M
    [J]. ADVANCES IN CROSS-LANGUAGE INFORMATION RETRIEVAL, 2003, 2785 : 338 - 342
  • [32] Natural language information retrieval
    Corston-Oliver, S
    [J]. COMPUTATIONAL LINGUISTICS, 2000, 26 (03) : 460 - 462
  • [33] Language Modeling for Information Retrieval
    Börkur Sigurbjörnsson
    [J]. Journal of Logic, Language and Information, 2004, 13 (4) : 531 - 534
  • [34] Language modeling for information retrieval
    Thompson, P
    [J]. COMPUTATIONAL LINGUISTICS, 2004, 30 (01) : 110 - 111
  • [35] INFORMATION RETRIEVAL IN GENERAL PRACTICE
    FORBES, J
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY OF MEDICINE-LONDON, 1970, 63 (09): : 917 - &
  • [36] Domain-specific information retrieval based on improved language model
    Kang, Kai
    Lin, Kunhui
    Zhou, Changle
    Guo, Feng
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 374 - +
  • [37] Multi-Style Language Model for Web Scale Information Retrieval
    Wang, Kuansan
    Li, Xiaolong
    Gao, Jianfeng
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 467 - 474
  • [38] Documents ranking based on a hybrid language model for Chinese information retrieval
    Zheng, Dequan
    Yu, Feng
    Zhao, Tiejun
    Li, Sheng
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION ACQUISITION, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2006, : 279 - 283
  • [39] UTILIZATION OF CROSS-TERMS TO ENHANCE THE LANGUAGE MODEL FOR INFORMATION RETRIEVAL
    Barakat, Huda Mohammed
    Ismail, Maizatul Akmar
    Ravana, Sri Devi
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2013, 26 (03) : 196 - 210
  • [40] A Hybrid Statistical Language Model Applied to the Domain Specific Information Retrieval
    Wei Wang
    Kunhui Lin
    Changle Zhou
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 3 - +