Clusters, Language Models, and ad hoc Information Retrieval

被引:11
|
作者
Kurland, Oren [1 ]
Lee, Lillian [2 ]
机构
[1] Technion Israel Inst Technol, Fac Ind Engn & Management, IL-32000 Haifa, Israel
[2] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
关键词
Algorithms; Experimentation; Language modeling; aspect models; interpolation model; clustering; smoothing; cluster-based language models; cluster hypothesis;
D O I
10.1145/1508850.1508851
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The language-modeling approach to information retrieval provides an effective statistical framework for tackling various problems and often achieves impressive empirical performance. However, most previous work on language models for information retrieval focused on document-specific characteristics, and therefore did not take into account the structure of the surrounding corpus, a potentially rich source of additional information. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in terms of mean average precision (MAP) and recall, and our new interpolation algorithm posts statistically significant performance improvements for both metrics over all six corpora tested. An important aspect of our work is the way we model corpus structure. In contrast to most previous work on cluster-based retrieval that partitions the corpus, we demonstrate the effectiveness of a simple strategy based on a nearest-neighbors approach that produces overlapping clusters.
引用
收藏
页数:39
相关论文
共 50 条
  • [1] Topic based language models for ad hoc information retrieval
    Azzopardi, L
    Girolami, M
    van Rijsbergen, CJ
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3281 - 3286
  • [2] Topic signature language models for ad hoc retrieval
    Zhou, Xiaohua
    Hu, Xiaohua
    Zhang, Xiaodan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (09) : 1276 - 1287
  • [3] Topic Models Ensembles for AD-HOC Information Retrieval
    Ormeno, Pablo
    Mendoza, Marcelo
    Valle, Carlos
    INFORMATION, 2021, 12 (09)
  • [4] Ad Hoc Retrieval with the Persian Language
    Dolamic, Ljiljana
    Savoy, Jacques
    MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 102 - 109
  • [5] Ad Hoc Information Retrieval for Persian
    Habibian, AmirHossein
    AleAhmad, Abolfazl
    Shakery, Azadeh
    MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 110 - 119
  • [6] An Ad Hoc Information Retrieval Perspective on PLSI through Language Model Identification
    Chappelier, Jean-Cedric
    Eckard, Emmanuel
    ADVANCES IN INFORMATION RETRIEVAL THEORY, 2009, 5766 : 346 - 349
  • [7] Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval
    Karimzadehgan, Maryam
    Zhai, ChengXiang
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 323 - 330
  • [8] Applying light natural language processing to ad-hoc cross language information retrieval
    Lioma, Christina
    Macdonald, Craig
    He, Ben
    Plachouras, Vassilis
    Ounis, Ladh
    ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 170 - 178
  • [9] Utilizing passage-based language models for ad hoc document retrieval
    Michael Bendersky
    Oren Kurland
    Information Retrieval, 2010, 13 : 157 - 187
  • [10] Utilizing passage-based language models for ad hoc document retrieval
    Bendersky, Michael
    Kurland, Oren
    INFORMATION RETRIEVAL, 2010, 13 (02): : 157 - 187