Clusters, Language Models, and ad hoc Information Retrieval

被引:11
|
作者
Kurland, Oren [1 ]
Lee, Lillian [2 ]
机构
[1] Technion Israel Inst Technol, Fac Ind Engn & Management, IL-32000 Haifa, Israel
[2] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
关键词
Algorithms; Experimentation; Language modeling; aspect models; interpolation model; clustering; smoothing; cluster-based language models; cluster hypothesis;
D O I
10.1145/1508850.1508851
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The language-modeling approach to information retrieval provides an effective statistical framework for tackling various problems and often achieves impressive empirical performance. However, most previous work on language models for information retrieval focused on document-specific characteristics, and therefore did not take into account the structure of the surrounding corpus, a potentially rich source of additional information. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in terms of mean average precision (MAP) and recall, and our new interpolation algorithm posts statistically significant performance improvements for both metrics over all six corpora tested. An important aspect of our work is the way we model corpus structure. In contrast to most previous work on cluster-based retrieval that partitions the corpus, we demonstrate the effectiveness of a simple strategy based on a nearest-neighbors approach that produces overlapping clusters.
引用
收藏
页数:39
相关论文
共 50 条
  • [21] Statistical Language Models for Information Retrieval
    Gaussier, Eric
    COMPUTATIONAL LINGUISTICS, 2010, 36 (02) : 279 - 281
  • [22] Analysis of Retrieval Models for Cross Language Information Retrieval
    Ujjwal, Dasu
    Rastogi, Prakhar
    Siddhartha, Siril
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
  • [23] Information-Based Models for Ad Hoc IR
    Clinchant, Stephane
    Gaussier, Eric
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 234 - 241
  • [24] Wikiformer: Pre -training with Structured Information of Wikipedia for Ad -Hoc Retrieval
    Su, Weihang
    Ai, Qingyao
    Li, Xiangsheng
    Chen, Jia
    Liu, Yiqun
    Wu, Xiaolong
    Hou, Shengluan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19026 - 19034
  • [25] Neural Ad-Hoc Retrieval Meets Open Information Extraction
    Vo, Duc-Thuan
    Zarrinkalam, Fattane
    Pham, Ba
    Arabzadeh, Negar
    Salamat, Sara
    Bagheri, Ebrahim
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 : 655 - 663
  • [26] Fusion of Retrieval Models at CLEF 2008 Ad Hoc Persian Track
    Aghazade, Zahra
    Dehghani, Nazanin
    Farzinvash, Leili
    Rahimi, Razieh
    AleAhmad, Abolfazl
    Amiri, Hadi
    Oroumchian, Farhad
    EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 97 - +
  • [27] Term feedback for information retrieval with language models
    Dept. of Computer Science, University of Illinois, Urbana-Champaign
    不详
    Proc. Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., 2007, (263-270):
  • [28] Performance comparison of language models for information retrieval
    Dai, SX
    Diao, Q
    Zhou, CL
    Artificial Intelligence Applications and Innovations II, 2005, 187 : 721 - 730
  • [29] Improving Language Estimation with the Paragraph Vector Model for Ad-hoc Retrieval
    Ai, Qingyao
    Yang, Liu
    Guo, Jiafeng
    Croft, W. Bruce
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 869 - 872
  • [30] Identifying and exploiting target entity type information for ad hoc entity retrieval
    Garigliotti, Dario
    Hasibi, Faegheh
    Balog, Krisztian
    INFORMATION RETRIEVAL JOURNAL, 2019, 22 (3-4): : 285 - 323