A probabilistic information retrieval model by document ranking using term dependencies

被引:0
|
作者
You, Hyun-Jo [1 ]
Lee, Jung-Jin [2 ,3 ]
机构
[1] Seoul Natl Univ, Program Data Sci Humanities, Seoul, South Korea
[2] Soongsil Univ, Dept Stat & Actuarial Sci, 369 Sangdo Ro, Seoul 06978, South Korea
[3] ADA Univ, 11 Ahmadbay Agha Oglu St, AZ-1008 Baku, Azerbaijan
关键词
information retrieval; document ranking; maximum entropy principle; iterative proportional fitting algorithm; PRINCIPLE;
D O I
10.5351/KJAS.2019.32.5.763
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper proposes a probabilistic document ranking model incorporating term dependencies. Document ranking is a fundamental information retrieval task. The task is to sort documents in a collection according to the relevance to the user query (Qin et al., Information Retrieval Journal, 13, 346-374, 2010). A probabilistic model is a model for computing the conditional probability of the relevance of each document given query. Most of the widely used models assume the term independence because it is challenging to compute the joint probabilities of multiple terms. Words in natural language texts are obviously highly correlated. In this paper, we assume a multinomial distribution model to calculate the relevance probability of a document by considering the dependency structure of words, and propose an information retrieval model to rank a document by estimating the probability with the maximum entropy method. The results of the ranking simulation experiment in various multinomial situations show better retrieval results than a model that assumes the independence of words. The results of document ranking experiments using real-world datasets LETOR OHSUMED also show better retrieval results.
引用
收藏
页码:763 / 782
页数:20
相关论文
共 50 条
  • [1] Analysis of Probabilistic model for Document Retrieval in Information Retrieval
    Tamrakar, Astha
    Vishwakarma, Santosh K.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 760 - 765
  • [2] Generalized Ensemble Model for Document Ranking in Information Retrieval
    Wang, Yanshan
    Choi, In-Chan
    Liu, Hongfang
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2017, 14 (01) : 123 - 151
  • [3] Probabilistic Ranking of Documents Using Vectors in Information Retrieval
    Saini, Balwinder
    Singh, Vikram
    [J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, 2015, 31 : 613 - 624
  • [4] DOCUMENT RETRIEVAL USING A PROBABILISTIC KNOWLEDGE MODEL
    Wang, Shuguang
    Visweswaran, Shyam
    Hauskrecht, Milos
    [J]. KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : 26 - +
  • [5] Exploring term dependences in probabilistic information retrieval model
    Cho, BH
    Lee, C
    Lee, GG
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (04) : 505 - 519
  • [6] Using Term Location Information to Enhance Probabilistic Information Retrieval
    Liu, Baiyan
    An, Xiangdong
    Huang, Jimmy Xiangji
    [J]. SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 883 - 886
  • [7] BOOLEAN QUERIES AND TERM DEPENDENCIES IN PROBABILISTIC RETRIEVAL MODELS
    CROFT, WB
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1986, 37 (02): : 71 - 77
  • [8] USING PROBABILISTIC MODELS OF DOCUMENT-RETRIEVAL WITHOUT RELEVANCE INFORMATION
    CROFT, WB
    HARPER, DJ
    [J]. JOURNAL OF DOCUMENTATION, 1979, 35 (04) : 285 - 295
  • [9] A Probabilistic Method for Ranking Refinement in Geographic Information Retrieval
    Villatoro-Tello, Esau
    Omar Chavez-Garcia, R.
    Montes-y-Gomez, Manuel
    Villasenor-Pineda, Luis
    Enrique Sucar, L.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44): : 123 - 130
  • [10] Achieving High Accuracy Retrieval using Intra-Document Term Ranking
    Woo, Hyun-Wook
    Lee, Jung-Tae
    Lee, Seung-Wook
    Song, Young-In
    Rim, Hae-Chang
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 885 - 886