Large Language Models are Built-in Autoregressive Search Engines

被引：0

作者：

Ziems, Noah ^{[1
]}

Yu, Wenhao ^{[1
]}

Zhang, Zhihan ^{[1
]}

Jiang, Meng ^{[1
]}

机构：

[1] Univ Notre Dame, Notre Dame, IN 46556 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Document retrieval is a key stage of standard Web search engines. Existing dual-encoder dense retrievers obtain representations for questions and documents independently, allowing for only shallow interactions between them. To overcome this limitation, recent autoregressive search engines replace the dual-encoder architecture by directly generating identifiers for relevant documents in the candidate pool. However, the training cost of such autoregressive search engines rises sharply as the number of candidate documents increases. In this paper, we find that large language models (LLMs) can follow human instructions to directly generate URLs for document retrieval. Surprisingly, when providing a few Query-URL pairs as in-context demonstrations, LLMs can generate Web URLs where nearly 90% of the corresponding documents contain correct answers to open-domain questions. In this way, LLMs can be thought of as built-in search engines, since they have not been explicitly trained to map questions to document identifiers. Experiments demonstrate that our method can consistently achieve better retrieval performance than existing retrieval approaches by a significant margin on three open-domain question answering benchmarks, under both zero and few-shot settings. The code for this work can be found at https://github.com/Ziems/llm-url.

引用

页码：2666 / 2678

页数：13

共 50 条

[21] Multimodal Large Language Models as Built Environment Auditing Tools
Jang, Kee Moon
Kim, Junghwan
PROFESSIONAL GEOGRAPHER, 2025, 77 (01): : 84 - 90
[22] A study of built-in filter for some eddy viscosity models in large-eddy simulation
Magnient, JC
Sagaut, P
Deville, M
PHYSICS OF FLUIDS, 2001, 13 (05) : 1440 - 1449
[23] Semantic Mechanical Search with Large Vision and Language Models
Sharma, Satvik
Huang, Huang
Shivakumar, Kaushik
Chen, Lawrence Yunliang
Hoque, Ryan
Ichter, Brian
Goldberg, Ken
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[24] USimAgent: Large Language Models for Simulating Search Users
Zhang, Erhan
Wang, Xingzhu
Gong, Peiyuan
Lin, Yankai
Mao, Jiaxin
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2687 - 2692
[25] A Built-In Redundancy Analysis with a Minimized Binary Search Tree
Cho, Hyungjun
Kang, Wooheon
Kang, Sungho
ETRI JOURNAL, 2010, 32 (04) : 638 - 641
[26] Multiple language supports in search engines
Zhang, Jin
Lin, Suyu
ONLINE INFORMATION REVIEW, 2007, 31 (04) : 516 - 532
[27] Multimedia search capabilities of Chinese language search engines
Chang, Yun-Ke
Morales-Arroyo, Miguel A.
Spink, Amanda
INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (03) : 308 - 319
[28] Optimising Search Engines Using Evolutionally Adapted Language Models in Typed Dependency Parses
Karwinski, Marcin
SWARM AND EVOLUTIONARY COMPUTATION, 2012, 7269 : 258 - 266
[29] Large language models should be used as scientific reasoning engines, not knowledge databases
Daniel Truhn
Jorge S. Reis-Filho
Jakob Nikolas Kather
Nature Medicine, 2023, 29 : 2983 - 2984
[30] Embedding Search for Quranic Texts based on Large Language Models
Alqarni, Mohammed
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (02) : 243 - 256

← 1 2 3 4 5 →