A New Term Frequency Normalization Model for Probabilistic Information Retrieval

被引:3
|
作者
Jian, Fanghong
Huang, Jimmy Xiangji [1 ]
Zhao, Jiashu
He, Tingting
机构
[1] Cent China Normal Univ, Informat Retrieval & Knowledge Management Res Lab, Wuhan, Hubei, Peoples R China
来源
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Term Frequency Normalization; BM25; Probabilistic Model;
D O I
10.1145/3209978.3210147
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In- probabilistic BM25, term frequency normalization is one of the key components. It is often controlled by parameters k(1) and b, which need to be optimized for each given data set. In this paper, we assume and show empirically that term frequency normalization should be specific with query length in order to optimize retrieval performance. Following this intuition, we first propose a new term frequency normalization with query length for probabilistic information retrieval, namely BM25(QL). Then BM25(QL) is incorporated into the state-of-the-art models CRTER2 and LDA-BM25, denoted as CRTER2QL and LDA-BM25(QL) respectively. A series of experiments show that our proposed approaches BM25(QL), CRTER2QL and LDA-BM25(QL) are comparable to BM25, CRTER2 and LDA-BM25 with the optimal b setting in terms of MAP on all the data sets.
引用
收藏
页码:1237 / 1240
页数:4
相关论文
共 50 条
  • [1] A topic-based term frequency normalization framework to enhance probabilistic information retrieval
    Jian, Fanghong
    Huang, Jimmy X.
    Zhao, Jiashu
    Ying, Zhiwei
    Wang, Yuqi
    [J]. COMPUTATIONAL INTELLIGENCE, 2020, 36 (02) : 486 - 521
  • [2] On setting the hyper-parameters of term frequency normalization for information retrieval
    He, Ben
    Ounis, Iadh
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2007, 25 (03)
  • [3] Exploring term dependences in probabilistic information retrieval model
    Cho, BH
    Lee, C
    Lee, GG
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (04) : 505 - 519
  • [4] Lexical normalization and relationship alternatives for a term dependence model in information retrieval
    Gonzalez, M
    de Lima, VLS
    de Lima, JV
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 394 - 405
  • [5] A probabilistic information retrieval model by document ranking using term dependencies
    You, Hyun-Jo
    Lee, Jung-Jin
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (05) : 763 - 782
  • [6] A probabilistic model for distributed information retrieval
    Baumgarten, C
    [J]. PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1997, : 258 - 266
  • [7] Modeling Term Associations for Probabilistic Information Retrieval
    Zhao, Jiashu
    Huang, Jimmy Xiangji
    Ye, Zheng
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2014, 32 (02)
  • [8] Analysis of Probabilistic model for Document Retrieval in Information Retrieval
    Tamrakar, Astha
    Vishwakarma, Santosh K.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 760 - 765
  • [9] Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval
    Zhang, Hui
    Wang, Deqing
    Wu, Wenjun
    Hu, Hongping
    [J]. ENTERPRISE INFORMATION SYSTEMS, 2012, 6 (04) : 433 - 444
  • [10] Rewarding Term Location Information to Enhance Probabilistic Information Retrieval
    Zhao, Jiashu
    Huang, Jimmy Xiangji
    Wu, Shicheng
    [J]. SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1137 - 1138