Parallel ContextWindows for Large Language Models

被引：0

作者：

Ratner, Nir ^{[1
]}

Levine, Yoav ^{[1
]}

Belinkov, Yonatan ^{[1
]}

Ram, Ori ^{[1
]}

Magar, Inbal ^{[1
]}

Abend, Omri ^{[1
]}

Karpas, Ehud ^{[1
]}

Shashua, Amnon ^{[1
]}

Leyton-Brown, Kevin ^{[1
]}

Shoham, Yoav ^{[1
]}

机构：

[1] AI21 Labs, Tel Aviv, Israel

来源：

PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to offthe-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks ("windows"), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at https://github.com/ ai21labs/parallel-context-windows.

引用

页码：6383 / 6402

页数：20

共 50 条

[1] Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models
Zhang, Weigang
Zhou, Biyu
Wu, Xing
Gao, Chaochen
Liu, Zhibing
Tang, Xuehai
Li, Ruixuan
Han, Jizhong
Hu, Songlin
[J]. EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 424 - 438
[2] Large Language Models are Not Models of Natural Language: They are Corpus Models
Veres, Csaba
[J]. IEEE ACCESS, 2022, 10 : 61970 - 61979
[3] Large Language Models
Vargas, Diego Collarana
Katsamanis, Nassos
[J]. ERCIM NEWS, 2024, (136): : 12 - 13
[4] Large Language Models
Cerf, Vinton G.
[J]. COMMUNICATIONS OF THE ACM, 2023, 66 (08) : 7 - 7
[5] Large Language Models in der WissenschaftLarge language models in science
Karl-Friedrich Kowalewski
Severin Rodler
[J]. Die Urologie, 2024, 63 (9) : 860 - 866
[6] The Importance of Understanding Language in Large Language Models
Youssef, Alaa
Stein, Samantha
Clapp, Justin
Magnus, David
[J]. AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 6 - 7
[7] Dissociating language and thought in large language models
Mahowald, Kyle
Ivanova, Anna A.
Blank, Idan A.
Kanwisher, Nancy
Tenenbaum, Joshua B.
Fedorenko, Evelina
[J]. TRENDS IN COGNITIVE SCIENCES, 2024, 28 (06) : 517 - 540
[8] Imitation and Large Language Models
Boisseau, Éloïse
[J]. Minds and Machines, 2024, 34 (04)
[9] Autoformalization with Large Language Models
Wu, Yuhuai
Jiang, Albert Q.
Li, Wenda
Rabe, Markus N.
Staats, Charles
Jamnik, Mateja
Szegedy, Christian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[10] The Smallness of Large Language Models
Denning, Peter J.
[J]. COMMUNICATIONS OF THE ACM, 2023, 66 (09) : 24 - 27

← 1 2 3 4 5 →