A probabilistic information retrieval model by document ranking using term dependencies

被引:0
|
作者
You, Hyun-Jo [1 ]
Lee, Jung-Jin [2 ,3 ]
机构
[1] Seoul Natl Univ, Program Data Sci Humanities, Seoul, South Korea
[2] Soongsil Univ, Dept Stat & Actuarial Sci, 369 Sangdo Ro, Seoul 06978, South Korea
[3] ADA Univ, 11 Ahmadbay Agha Oglu St, AZ-1008 Baku, Azerbaijan
关键词
information retrieval; document ranking; maximum entropy principle; iterative proportional fitting algorithm; PRINCIPLE;
D O I
10.5351/KJAS.2019.32.5.763
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper proposes a probabilistic document ranking model incorporating term dependencies. Document ranking is a fundamental information retrieval task. The task is to sort documents in a collection according to the relevance to the user query (Qin et al., Information Retrieval Journal, 13, 346-374, 2010). A probabilistic model is a model for computing the conditional probability of the relevance of each document given query. Most of the widely used models assume the term independence because it is challenging to compute the joint probabilities of multiple terms. Words in natural language texts are obviously highly correlated. In this paper, we assume a multinomial distribution model to calculate the relevance probability of a document by considering the dependency structure of words, and propose an information retrieval model to rank a document by estimating the probability with the maximum entropy method. The results of the ranking simulation experiment in various multinomial situations show better retrieval results than a model that assumes the independence of words. The results of document ranking experiments using real-world datasets LETOR OHSUMED also show better retrieval results.
引用
收藏
页码:763 / 782
页数:20
相关论文
共 50 条
  • [41] A hybrid model of image retrieval based on ontology technology and probabilistic ranking
    Fan, Lisa
    Li, Botang
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 477 - +
  • [42] Mean-Variance Analysis: A New Document Ranking Theory in Information Retrieval
    Wang, Jun
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 4 - 16
  • [43] A Neural Autoencoder Approach for Document Ranking and Query Refinement in Pharmacogenomic Information Retrieval
    Pfeiffer, Jonas
    Broscheit, Samuel
    Gemulla, Rainer
    Goeschl, Mathias
    [J]. SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2018), 2018, : 87 - 97
  • [44] Contextualisation of information retrieval process and document ranking task in web search tools
    Bouramoul, Abdelkrim
    [J]. INTERNATIONAL JOURNAL OF SPACE-BASED AND SITUATED COMPUTING, 2016, 6 (02) : 74 - 89
  • [45] Information Retrieval Ranking Using Machine Learning Techniques
    Pandey, Shweta
    Mathur, Iti
    Joshi, Nisheeth
    [J]. PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, : 86 - 92
  • [46] Concept location using program dependencies and information retrieval (DepIR)
    Petrenko, Maksym
    Rajlich, Vaclav
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2013, 55 (04) : 651 - 659
  • [47] Semantic association ranking schemes for information retrieval applications using term association graph representation
    Veningston, K.
    Shanmugalakshmi, R.
    Nirmala, V.
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2015, 40 (06): : 1793 - 1819
  • [48] Semantic association ranking schemes for information retrieval applications using term association graph representation
    Veningston K.
    Shanmugalakshmi R.
    Nirmala V.
    [J]. Sadhana, 2015, 40 (6) : 1793 - 1819
  • [49] Information Retrieval Using the Reduced Row Echelon Form of a Term-Document Matrix
    Parali, Ufuk
    Zontul, Metin
    Ertugrul, Duygu Celik
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (04): : 1037 - 1046
  • [50] Using document dimensions for enhanced information retrieval
    Jayasooriya, T
    Manandhar, S
    [J]. APPLIED COMPUTING, PROCEEDINGS, 2004, 3285 : 145 - 152