Large Language Models are Built-in Autoregressive Search Engines

被引:0
|
作者
Ziems, Noah [1 ]
Yu, Wenhao [1 ]
Zhang, Zhihan [1 ]
Jiang, Meng [1 ]
机构
[1] Univ Notre Dame, Notre Dame, IN 46556 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document retrieval is a key stage of standard Web search engines. Existing dual-encoder dense retrievers obtain representations for questions and documents independently, allowing for only shallow interactions between them. To overcome this limitation, recent autoregressive search engines replace the dual-encoder architecture by directly generating identifiers for relevant documents in the candidate pool. However, the training cost of such autoregressive search engines rises sharply as the number of candidate documents increases. In this paper, we find that large language models (LLMs) can follow human instructions to directly generate URLs for document retrieval. Surprisingly, when providing a few Query-URL pairs as in-context demonstrations, LLMs can generate Web URLs where nearly 90% of the corresponding documents contain correct answers to open-domain questions. In this way, LLMs can be thought of as built-in search engines, since they have not been explicitly trained to map questions to document identifiers. Experiments demonstrate that our method can consistently achieve better retrieval performance than existing retrieval approaches by a significant margin on three open-domain question answering benchmarks, under both zero and few-shot settings. The code for this work can be found at https://github.com/Ziems/llm-url.
引用
收藏
页码:2666 / 2678
页数:13
相关论文
共 50 条
  • [1] Evaluating search engines and large language models for answering health questions
    Fernandez-Pichel, Marcos
    Pichel, Juan C.
    Losada, David E.
    NPJ DIGITAL MEDICINE, 2025, 8 (01):
  • [2] Efficient multi-event monitoring using built-in search engines
    Zhaoman Zhong
    Zongtian Liu
    Yun Hu
    Cunhua Li
    Frontiers of Computer Science, 2016, 10 : 281 - 291
  • [3] Efficient multi-event monitoring using built-in search engines
    Zhong, Zhaoman
    Liu, Zongtian
    hu, Yun
    Li, Cunhua
    FRONTIERS OF COMPUTER SCIENCE, 2016, 10 (02) : 281 - 291
  • [4] Efficient multi-event monitoring using built-in search engines
    Zhaoman ZHONG
    Zongtian LIU
    Yun HU
    Cunhua LI
    Frontiers of Computer Science, 2016, 10 (02) : 281 - 291
  • [5] From Search Engines to Large Language Models: A Big Leap for Patient Education!
    Barabino, Emanuele
    Cittadini, Giuseppe
    CARDIOVASCULAR AND INTERVENTIONAL RADIOLOGY, 2024, 47 (02) : 251 - 252
  • [6] Beyond Search Engines: Can Large Language Models Improve Curriculum Development?
    Moein, Mohammad
    Hajiagha, Mohammadreza Molavi
    Faraji, Abdolali
    Tavakoli, Mohammadreza
    Kismihok, Gabor
    TECHNOLOGY ENHANCED LEARNING FOR INCLUSIVE AND EQUITABLE QUALITY EDUCATION, PT II, EC-TEL 2024, 2024, 15160 : 131 - 136
  • [7] From Search Engines to Large Language Models: A Big Leap for Patient Education!
    Emanuele Barabino
    Giuseppe Cittadini
    CardioVascular and Interventional Radiology, 2024, 47 : 251 - 252
  • [8] Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models
    Xie, Sean
    Vosoughi, Soroush
    Hassanpour, Saeed
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3964 - 3979
  • [9] Large Language Models as Molecular Design Engines
    Bhattacharya, Debjyoti
    Cassady, Harrison J.
    Hickner, Michael A.
    Reinhart, Wesley F.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (18) : 7086 - 7096
  • [10] Analysis of Composite Scrubber with Built-In Silencer for Marine Engines
    Ryu, Myeong-Rok
    Park, Kweonha
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2021, 9 (09)