Query2doc: Query Expansion with Large Language Models

被引:0
|
作者
Wang, Liang [1 ]
Yang, Nan [1 ]
Wei, Furu [1 ]
机构
[1] Microsoft Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces a simple yet effective query expansion approach, denoted as query2doc, to improve both sparse and dense retrieval systems. The proposed method first generates pseudo-documents by few-shot prompting large language models (LLMs), and then expands the query with generated pseudo-documents. LLMs are trained on web-scale text corpora and are adept at knowledge memorization. The pseudo-documents from LLMs often contain highly relevant information that can aid in query disambiguation and guide the retrievers. Experimental results demonstrate that query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets, such as MS-MARCO and TREC DL, without any model fine-tuning. Furthermore, our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
引用
收藏
页码:9414 / 9423
页数:10
相关论文
共 50 条
  • [1] Corpus-Steered Query Expansion with Large Language Models
    Lei, Yibin
    Cao, Yu
    Zhou, Tianyi
    Shen, Tao
    Yates, Andrew
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 393 - 401
  • [2] Prompting Is Programming: A Query Language for Large Language Models
    Beurer-Kellner, Luca
    Fischer, Marc
    Vechev, Martin
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2023, 7 (PLDI):
  • [3] Combining Language Models with NLP and Interactive Query Expansion
    SanJuan, Eric
    Ibekwe-SanJuan, Fidelia
    FOCUSED RETRIEVAL AND EVALUATION, 2010, 6203 : 122 - +
  • [4] Query Expansion and Verification with Large Language Model for Information Retrieval
    Zhang, Wenjing
    Liu, Zhaoxiang
    Wang, Kai
    Lian, Shiguo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 341 - 351
  • [5] Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generation
    Park, Jimyung
    Fang, Yilu
    Ta, Casey
    Zhang, Gongbo
    Idnay, Betina
    Chen, Fangyi
    Feng, David
    Shyu, Rebecca
    Gordon, Emily R.
    Spotnitz, Matthew
    Weng, Chunhua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 154
  • [6] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [7] Query Expansion in Information Retrieval for Urdu Language
    Rasheed, Imran
    Banka, Haider
    2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 171 - 176
  • [8] Object Models as Microservices: a Query Language
    Gavrilin, Denis N.
    Kustova, Irina A.
    Mantsivoda, Andrei V.
    BULLETIN OF IRKUTSK STATE UNIVERSITY-SERIES MATHEMATICS, 2022, 42 : 121 - 137
  • [9] Synthetic Query Generation using Large Language Models for Virtual Assistants
    Sannigrahi, Sonal
    Fraga-Silva, Thiago
    Oualil, Youssef
    Van Gysel, Christophe
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2837 - 2841
  • [10] Combining fields for query expansion and adaptive query expansion
    He, Ben
    Ounis, Iadh
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (05) : 1294 - 1307