Modeling Term Associations for Probabilistic Information Retrieval

被引:34
|
作者
Zhao, Jiashu [1 ]
Huang, Jimmy Xiangji [2 ]
Ye, Zheng [2 ]
机构
[1] York Univ, Informat Retrieval & Knowledge Management Res Lab, Dept Comp Sci & Engn, N York, ON M3J 1P3, Canada
[2] York Univ, Informat Retrieval & Knowledge Management Res Lab, Sch Informat Technol, N York, ON M3J 1P3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Theory; Experimentation; Algorithms; Performance; Cross term; BM25; probabilistic information retrieval; kernel; term association; N-gram; PERFORMANCE; PROXIMITY;
D O I
10.1145/2590988
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditionally, in many probabilistic retrieval models, query terms are assumed to be independent. Although such models can achieve reasonably good performance, associations can exist among terms from a human being's point of view. There are some recent studies that investigate how to model term associations/dependencies by proximity measures. However, the modeling of term associations theoretically under the probabilistic retrieval framework is still largely unexplored. In this article, we introduce a new concept cross term, to model term proximity, with the aim of boosting retrieval performance. With cross terms, the association of multiple query terms can be modeled in the same way as a simple unigram term. In particular, an occurrence of a query term is assumed to have an impact on its neighboring text. The degree of the query-term impact gradually weakens with increasing distance from the place of occurrence. We use shape functions to characterize such impacts. Based on this assumption, we first propose a bigram CRoss TErm Retrieval (CRTER2) model as the basis model, and then recursively propose a generalized n-gram CRoss TErm Retrieval (CRTERn) model for n query terms, where n > 2. Specifically, a bigram cross term occurs when the corresponding query terms appear close to each other, and its impact can be modeled by the intersection of the respective shape functions of the query terms. For an n-gram cross term, we develop several distance metrics with different properties and employ them in the proposed models for ranking. We also show how to extend the language model using the newly proposed cross terms. Extensive experiments on a number of TREC collections demonstrate the effectiveness of our proposed models.
引用
收藏
页数:47
相关论文
共 50 条
  • [1] Modeling term proximity for probabilistic information retrieval models
    He, Ben
    Huang, Jimmy Xiangji
    Zhou, Xiaofeng
    [J]. INFORMATION SCIENCES, 2011, 181 (14) : 3017 - 3031
  • [2] ON MODELING INFORMATION-RETRIEVAL WITH PROBABILISTIC INFERENCE
    WONG, SKM
    YAO, YY
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 38 - 68
  • [3] Rewarding Term Location Information to Enhance Probabilistic Information Retrieval
    Zhao, Jiashu
    Huang, Jimmy Xiangji
    Wu, Shicheng
    [J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1137 - 1138
  • [4] Using Term Location Information to Enhance Probabilistic Information Retrieval
    Liu, Baiyan
    An, Xiangdong
    Huang, Jimmy Xiangji
    [J]. SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 883 - 886
  • [5] Exploring term dependences in probabilistic information retrieval model
    Cho, BH
    Lee, C
    Lee, GG
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (04) : 505 - 519
  • [6] Improving probabilistic information retrieval by modeling burstiness of words
    Xu, Zuobing
    Akella, Ram
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (02) : 143 - 158
  • [7] A New Term Frequency Normalization Model for Probabilistic Information Retrieval
    Jian, Fanghong
    Huang, Jimmy Xiangji
    Zhao, Jiashu
    He, Tingting
    [J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 1237 - 1240
  • [8] A probabilistic information retrieval model by document ranking using term dependencies
    You, Hyun-Jo
    Lee, Jung-Jin
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (05) : 763 - 782
  • [9] SOME INCONSISTENCIES AND MISIDENTIFIED MODELING ASSUMPTIONS IN PROBABILISTIC INFORMATION-RETRIEVAL
    COOPER, WS
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 100 - 111
  • [10] A probabilistic justification for using tf × idf term weighting in information retrieval
    Hiemstra D.
    [J]. International Journal on Digital Libraries, 2000, 3 (2) : 131 - 139