A RETRIEVAL-SYSTEM BASED ON AUTOMATIC RELEVANCE WEIGHTING OF SEARCH TERMS

被引:0
|
作者
WILBUR, WJ
机构
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We have developed a retrieval methodology based on relevance weighting of search terms with an automatic implementation. It is founded on the familiar Bayesian formulation of the probability of relevance as a function of term occurrence where the contribution from individual terms is assumed to be independent. However our formulation departs from the usual in that it is based on considering pairs of documents, and terms contribute to the score (log odds of relevance) in a symmetric manner. Terms which appear in both documents in a pair contribute positively to the score and terms that occur in only one document contribute negatively. This allows the length of a document to become a negative factor in the scoring much as it is in the cosine formula commonly used in vector retrieval. As a consequence each term has two weights associated with it, one positive which is its contribution when it occurs in both documents of a pair, and one negative which is its contribution when it occurs in only one document of a pair. A method is found to incorporate local term weights based on within document term frequencies, into this retrieval scheme. We term the result the relevance pairs(RP) model. We obtain the necessary statistics to estimate the weights in the model by substituting the set of all highly rated document pairs identified by a successful retrieval method for the set of all relevant pairs of documents. For the successful retrieval method we use the vector cosine (VC) model. In our test environment we find improved retrieval by the RP model when compared with the VC model.
引用
收藏
页码:216 / 220
页数:5
相关论文
共 50 条