Improving query expansion using pseudo-relevant web knowledge for information retrieval

被引:17
|
作者
Azad, Hiteshwar Kumar [1 ]
Deepak, Akshay [2 ]
Chakraborty, Chinmay [3 ]
Abhishek, Kumar [2 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore, Tamil Nadu, India
[2] Natl Inst Technol Patna, Dept Comp Sci & Engn, Patna, Bihar, India
[3] Birla Inst Technol, Mesra, Jharkhand, India
关键词
Information retrieval; Query expansion; Pseudo relevance feedback; Web search; Web knowledge; PERFORMANCE;
D O I
10.1016/j.patrec.2022.04.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of information retrieval, query expansion (QE) has long been used as a technique to deal with the fundamental issue of word mismatch between a user's query and the target information. In the context of the relationship between the query and expanded terms, existing weighting techniques often fail to appropriately capture the term-term relationship and term to the whole query relationship, result-ing in low retrieval effectiveness. Our proposed QE approach addresses this by proposing three weight-ing models based on (1) tf-idf, (2) k-nearest neighbor (kNN) based cosine similarity, and (3) correlation score. Further, to extract the initial set of expanded terms, we use pseudo-relevant web knowledge con-sisting of the top N web pages returned by the three popular search engines namely, Google, Bing, and DuckDuckGo, in response to the original query. Among the three weighting models, tf-idf scores each of the individual terms obtained from the web content, kNN-based cosine similarity scores the expansion terms to obtain the term-term relationship, and correlation score weighs the selected expansion terms with respect to the whole query. The proposed model, called web knowledge based query expansion (WKQE), achieves an improvement of 25.89% on the Mean Average Precision (MAP) score and 30.83% on the Geometric Mean Average precision (GMAP) score over the unexpanded queries on the FIRE dataset. A comparative analysis of the WKQE techniques with other related approaches clearly shows significant improvement in the retrieval performance. We have also analyzed the effect of varying the number of pseudo-relevant documents and expansion terms on the retrieval effectiveness of the proposed model.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:148 / 156
页数:9
相关论文
共 50 条
  • [1] GENERATING PSEUDO-RELEVANT REPRESENTATIONS FOR SPOKEN DOCUMENT RETRIEVAL
    Wu, Zheng-Yu
    Yen, Li-Phen
    Chen, Kuan-Yu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7370 - 7374
  • [2] Improved Query-Topic Models Using Pseudo-Relevant Polya Document Models
    Cummins, Ronan
    [J]. ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 101 - 108
  • [3] Efficient inverse query expansion in information retrieval using knowledge reduction
    Yoon, Changwoo
    [J]. WMSCI 2006: 10TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS, 2006, : 169 - 173
  • [4] Dimension Projection Among Languages Based on Pseudo-Relevant Documents for Query Translation
    Dadashkarimi, Javid
    Shahshahani, Mahsa S.
    Tebbifakhr, Amirhossein
    Faili, Heshaam
    Shakery, Azadeh
    [J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 493 - 499
  • [5] Query Expansion Using Medical Information Extraction for Improving Information Retrieval in French Medical Domain
    Ghoulam, Aicha
    Barigou, Fatiha
    Belalem, Ghalem
    Meziane, Farid
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2018, 14 (03) : 1 - 17
  • [6] An expectation-maximization algorithm for query translation based on pseudo-relevant documents
    Dadashkarimi, Javid
    Shakery, Azadeh
    Faili, Heshaam
    Zamani, Hamed
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (02) : 371 - 387
  • [7] RESEARCH ON THE WEB INFORMATION RETRIEVAL MODEL BASED ON METADATA AND QUERY EXPANSION
    Hu, Changxia
    Liu, Xiaoxing
    Jin, Weiying
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT, PROCEEDINGS, 2009, : 384 - +
  • [8] Enhanced Web document retrieval using automatic query expansion
    Khan, MS
    Khor, S
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (01): : 29 - 40
  • [9] Improving MEDLINE document retrieval using automatic query expansion
    Yoo, Sooyoung
    Choi, Jinwook
    [J]. ASIAN DIGITAL LIBRARIES: LOOKING BACK 10 YEARS AND FORGING NEW FRONTIERS, PROCEEDINGS, 2007, 4822 : 241 - 249
  • [10] Parallel information retrieval with query expansion
    Chung, YJ
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (06) : 1593 - 1595