Large Language Models are Built-in Autoregressive Search Engines

被引:0
|
作者
Ziems, Noah [1 ]
Yu, Wenhao [1 ]
Zhang, Zhihan [1 ]
Jiang, Meng [1 ]
机构
[1] Univ Notre Dame, Notre Dame, IN 46556 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document retrieval is a key stage of standard Web search engines. Existing dual-encoder dense retrievers obtain representations for questions and documents independently, allowing for only shallow interactions between them. To overcome this limitation, recent autoregressive search engines replace the dual-encoder architecture by directly generating identifiers for relevant documents in the candidate pool. However, the training cost of such autoregressive search engines rises sharply as the number of candidate documents increases. In this paper, we find that large language models (LLMs) can follow human instructions to directly generate URLs for document retrieval. Surprisingly, when providing a few Query-URL pairs as in-context demonstrations, LLMs can generate Web URLs where nearly 90% of the corresponding documents contain correct answers to open-domain questions. In this way, LLMs can be thought of as built-in search engines, since they have not been explicitly trained to map questions to document identifiers. Experiments demonstrate that our method can consistently achieve better retrieval performance than existing retrieval approaches by a significant margin on three open-domain question answering benchmarks, under both zero and few-shot settings. The code for this work can be found at https://github.com/Ziems/llm-url.
引用
收藏
页码:2666 / 2678
页数:13
相关论文
共 50 条
  • [21] Multimodal Large Language Models as Built Environment Auditing Tools
    Jang, Kee Moon
    Kim, Junghwan
    PROFESSIONAL GEOGRAPHER, 2025, 77 (01): : 84 - 90
  • [22] A study of built-in filter for some eddy viscosity models in large-eddy simulation
    Magnient, JC
    Sagaut, P
    Deville, M
    PHYSICS OF FLUIDS, 2001, 13 (05) : 1440 - 1449
  • [23] Semantic Mechanical Search with Large Vision and Language Models
    Sharma, Satvik
    Huang, Huang
    Shivakumar, Kaushik
    Chen, Lawrence Yunliang
    Hoque, Ryan
    Ichter, Brian
    Goldberg, Ken
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [24] USimAgent: Large Language Models for Simulating Search Users
    Zhang, Erhan
    Wang, Xingzhu
    Gong, Peiyuan
    Lin, Yankai
    Mao, Jiaxin
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2687 - 2692
  • [25] A Built-In Redundancy Analysis with a Minimized Binary Search Tree
    Cho, Hyungjun
    Kang, Wooheon
    Kang, Sungho
    ETRI JOURNAL, 2010, 32 (04) : 638 - 641
  • [26] Multiple language supports in search engines
    Zhang, Jin
    Lin, Suyu
    ONLINE INFORMATION REVIEW, 2007, 31 (04) : 516 - 532
  • [27] Multimedia search capabilities of Chinese language search engines
    Chang, Yun-Ke
    Morales-Arroyo, Miguel A.
    Spink, Amanda
    INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (03) : 308 - 319
  • [28] Optimising Search Engines Using Evolutionally Adapted Language Models in Typed Dependency Parses
    Karwinski, Marcin
    SWARM AND EVOLUTIONARY COMPUTATION, 2012, 7269 : 258 - 266
  • [29] Large language models should be used as scientific reasoning engines, not knowledge databases
    Daniel Truhn
    Jorge S. Reis-Filho
    Jakob Nikolas Kather
    Nature Medicine, 2023, 29 : 2983 - 2984
  • [30] Embedding Search for Quranic Texts based on Large Language Models
    Alqarni, Mohammed
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (02) : 243 - 256