Clusters, Language Models, and ad hoc Information Retrieval

被引:11
|
作者
Kurland, Oren [1 ]
Lee, Lillian [2 ]
机构
[1] Technion Israel Inst Technol, Fac Ind Engn & Management, IL-32000 Haifa, Israel
[2] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
关键词
Algorithms; Experimentation; Language modeling; aspect models; interpolation model; clustering; smoothing; cluster-based language models; cluster hypothesis;
D O I
10.1145/1508850.1508851
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The language-modeling approach to information retrieval provides an effective statistical framework for tackling various problems and often achieves impressive empirical performance. However, most previous work on language models for information retrieval focused on document-specific characteristics, and therefore did not take into account the structure of the surrounding corpus, a potentially rich source of additional information. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in terms of mean average precision (MAP) and recall, and our new interpolation algorithm posts statistically significant performance improvements for both metrics over all six corpora tested. An important aspect of our work is the way we model corpus structure. In contrast to most previous work on cluster-based retrieval that partitions the corpus, we demonstrate the effectiveness of a simple strategy based on a nearest-neighbors approach that produces overlapping clusters.
引用
收藏
页数:39
相关论文
共 50 条
  • [31] A Simple Enhancement for Ad-hoc Information Retrieval via Topic Modelling
    Jian, Fanghong
    Huang, Jimmy Xiangji
    Zhao, Jiashu
    He, Tingting
    Hu, Po
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 733 - 736
  • [32] A study on the use of stemming for monolingual ad-hoc Portuguese information retrieval
    Orengo, Viviane Moreira
    Buriol, Luciana S.
    Coelho, Alexandre Ramos
    EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 91 - +
  • [33] Conditional variational autoencoder for query expansion in ad-hoc information retrieval
    Ou, Wei
    Huynh, Van-Nam
    INFORMATION SCIENCES, 2024, 652
  • [34] MIRACLE Progress in Monolingual Information Retrieval at Ad-Hoc CLEF 2007
    Gonzalez-Cristobal, Jose-Carlos
    Goni-Menoyo, Jose Miguel
    Villena-Roman, Julio
    Lana-Serrano, Sara
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 156 - +
  • [35] Identifying and exploiting target entity type information for ad hoc entity retrieval
    Darío Garigliotti
    Faegheh Hasibi
    Krisztian Balog
    Information Retrieval Journal, 2019, 22 : 285 - 323
  • [36] Statistical language models for information retrieval a critical review
    University of Illinois at Urbana-Champaign, 201 N. Goodwin, Urbana, IL 61801, United States
    Found. Trends Inf. Retr., 2008, 3 (137-213):
  • [37] On-The-Fly Information Retrieval Augmentation for Language Models
    Wang, Hai
    McAllester, David
    NARRATIVE UNDERSTANDING, STORYLINES, AND EVENTS, 2020, : 114 - 119
  • [38] COMPARATIVE ESTIMATION OF MODELS OF THE INFORMATION-RETRIEVAL LANGUAGE
    GULNITSKII, LL
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1982, (06): : 15 - 20
  • [39] Using Large Language Models for Math Information Retrieval
    Mansouri, Behrooz
    Maarefdoust, Reihaneh
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2693 - 2697
  • [40] Ad-hoc Information Retrieval based on Boosted Latent Dirichlet Allocated Topics
    Mendoza, Marcelo
    Ormeno, Pablo
    Valle, Carlos
    2018 37TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2018,