A New Term Frequency Normalization Model for Probabilistic Information Retrieval

被引:3
|
作者
Jian, Fanghong
Huang, Jimmy Xiangji [1 ]
Zhao, Jiashu
He, Tingting
机构
[1] Cent China Normal Univ, Informat Retrieval & Knowledge Management Res Lab, Wuhan, Hubei, Peoples R China
来源
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Term Frequency Normalization; BM25; Probabilistic Model;
D O I
10.1145/3209978.3210147
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In- probabilistic BM25, term frequency normalization is one of the key components. It is often controlled by parameters k(1) and b, which need to be optimized for each given data set. In this paper, we assume and show empirically that term frequency normalization should be specific with query length in order to optimize retrieval performance. Following this intuition, we first propose a new term frequency normalization with query length for probabilistic information retrieval, namely BM25(QL). Then BM25(QL) is incorporated into the state-of-the-art models CRTER2 and LDA-BM25, denoted as CRTER2QL and LDA-BM25(QL) respectively. A series of experiments show that our proposed approaches BM25(QL), CRTER2QL and LDA-BM25(QL) are comparable to BM25, CRTER2 and LDA-BM25 with the optimal b setting in terms of MAP on all the data sets.
引用
收藏
页码:1237 / 1240
页数:4
相关论文
共 50 条
  • [41] ON RELEVANCE, PROBABILISTIC INDEXING AND INFORMATION RETRIEVAL
    MARON, ME
    KUHNS, JL
    [J]. JOURNAL OF THE ACM, 1960, 7 (03) : 216 - 244
  • [42] Information retrieval, imaging and probabilistic logic
    Sebastiani, F
    [J]. COMPUTERS AND ARTIFICIAL INTELLIGENCE, 1998, 17 (01): : 35 - 50
  • [43] TERM WEIGHTING IN INFORMATION-RETRIEVAL USING THE TERM PRECISION MODEL
    YU, CT
    LAM, K
    SALTON, G
    [J]. JOURNAL OF THE ACM, 1982, 29 (01) : 152 - 170
  • [44] IS HYPERTEXT A NEW MODEL OF INFORMATION-RETRIEVAL
    AGOSTI, M
    [J]. ONLINE INFORMATION 88, PROCEEDINGS VOLS 1-2, 1988, : 57 - 62
  • [45] A new web usage model for information retrieval
    Zhou Hong-fang
    Feng Bo-qin
    Yue Hui
    Lv Lin-tao
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 1456 - 1459
  • [46] Using Word Embeddings for Information Retrieval: How Collection and Term Normalization Choices Affect Performance
    Roy, Dwaipayan
    Ganguly, Debasis
    Bhatia, Sumit
    Bedathur, Srikanta
    Mitra, Mandar
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1835 - 1838
  • [47] A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering
    Sarrouti, Mourad
    Ouatik El Alaoui, Said
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 68 : 96 - 103
  • [48] Information retrieval for OCR documents: A content-based probabilistic correction model
    Jin, R
    Zhai, CX
    Hauptmann, AG
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL X, 2003, 5010 : 128 - 135
  • [49] Scientific literature retrieval model based on weighted term frequency
    Yang, Xi-Quan
    Yang, Dian
    Yuan, Ming
    Lv, Xing-Hua
    [J]. 2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 427 - 430
  • [50] Term position-based language model for information retrieval
    Hammache, Arezki
    Boughanem, Mohand
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2021, 72 (05) : 627 - 642