Modeling Term Associations for Probabilistic Information Retrieval

被引：34

作者：

Zhao, Jiashu ^{[1
]}

Huang, Jimmy Xiangji ^{[2
]}

Ye, Zheng ^{[2
]}

机构：

[1] York Univ, Informat Retrieval & Knowledge Management Res Lab, Dept Comp Sci & Engn, N York, ON M3J 1P3, Canada

[2] York Univ, Informat Retrieval & Knowledge Management Res Lab, Sch Informat Technol, N York, ON M3J 1P3, Canada

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2014年 / 32卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Theory; Experimentation; Algorithms; Performance; Cross term; BM25; probabilistic information retrieval; kernel; term association; N-gram; PERFORMANCE; PROXIMITY;

D O I：

10.1145/2590988

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Traditionally, in many probabilistic retrieval models, query terms are assumed to be independent. Although such models can achieve reasonably good performance, associations can exist among terms from a human being's point of view. There are some recent studies that investigate how to model term associations/dependencies by proximity measures. However, the modeling of term associations theoretically under the probabilistic retrieval framework is still largely unexplored. In this article, we introduce a new concept cross term, to model term proximity, with the aim of boosting retrieval performance. With cross terms, the association of multiple query terms can be modeled in the same way as a simple unigram term. In particular, an occurrence of a query term is assumed to have an impact on its neighboring text. The degree of the query-term impact gradually weakens with increasing distance from the place of occurrence. We use shape functions to characterize such impacts. Based on this assumption, we first propose a bigram CRoss TErm Retrieval (CRTER2) model as the basis model, and then recursively propose a generalized n-gram CRoss TErm Retrieval (CRTERn) model for n query terms, where n > 2. Specifically, a bigram cross term occurs when the corresponding query terms appear close to each other, and its impact can be modeled by the intersection of the respective shape functions of the query terms. For an n-gram cross term, we develop several distance metrics with different properties and employ them in the proposed models for ranking. We also show how to extend the language model using the newly proposed cross terms. Extensive experiments on a number of TREC collections demonstrate the effectiveness of our proposed models.

引用

页数：47

共 50 条

[1] Modeling term proximity for probabilistic information retrieval models
He, Ben
Huang, Jimmy Xiangji
Zhou, Xiaofeng
[J]. INFORMATION SCIENCES, 2011, 181 (14) : 3017 - 3031
[2] ON MODELING INFORMATION-RETRIEVAL WITH PROBABILISTIC INFERENCE
WONG, SKM
YAO, YY
[J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 38 - 68
[3] Rewarding Term Location Information to Enhance Probabilistic Information Retrieval
Zhao, Jiashu
Huang, Jimmy Xiangji
Wu, Shicheng
[J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1137 - 1138
[4] Using Term Location Information to Enhance Probabilistic Information Retrieval
Liu, Baiyan
An, Xiangdong
Huang, Jimmy Xiangji
[J]. SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 883 - 886
[5] Exploring term dependences in probabilistic information retrieval model
Cho, BH
Lee, C
Lee, GG
[J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (04) : 505 - 519
[6] Improving probabilistic information retrieval by modeling burstiness of words
Xu, Zuobing
Akella, Ram
[J]. INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (02) : 143 - 158
[7] A New Term Frequency Normalization Model for Probabilistic Information Retrieval
Jian, Fanghong
Huang, Jimmy Xiangji
Zhao, Jiashu
He, Tingting
[J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 1237 - 1240
[8] A probabilistic information retrieval model by document ranking using term dependencies
You, Hyun-Jo
Lee, Jung-Jin
[J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (05) : 763 - 782
[9] SOME INCONSISTENCIES AND MISIDENTIFIED MODELING ASSUMPTIONS IN PROBABILISTIC INFORMATION-RETRIEVAL
COOPER, WS
[J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 100 - 111
[10] A probabilistic justification for using tf × idf term weighting in information retrieval
Hiemstra D.
[J]. International Journal on Digital Libraries, 2000, 3 (2) : 131 - 139

← 1 2 3 4 5 →