Parallel ContextWindows for Large Language Models

被引:0
|
作者
Ratner, Nir [1 ]
Levine, Yoav [1 ]
Belinkov, Yonatan [1 ]
Ram, Ori [1 ]
Magar, Inbal [1 ]
Abend, Omri [1 ]
Karpas, Ehud [1 ]
Shashua, Amnon [1 ]
Leyton-Brown, Kevin [1 ]
Shoham, Yoav [1 ]
机构
[1] AI21 Labs, Tel Aviv, Israel
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to offthe-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks ("windows"), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at https://github.com/ ai21labs/parallel-context-windows.
引用
收藏
页码:6383 / 6402
页数:20
相关论文
共 50 条
  • [1] Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models
    Zhang, Weigang
    Zhou, Biyu
    Wu, Xing
    Gao, Chaochen
    Liu, Zhibing
    Tang, Xuehai
    Li, Ruixuan
    Han, Jizhong
    Hu, Songlin
    [J]. EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 424 - 438
  • [2] Large Language Models are Not Models of Natural Language: They are Corpus Models
    Veres, Csaba
    [J]. IEEE ACCESS, 2022, 10 : 61970 - 61979
  • [3] Large Language Models
    Vargas, Diego Collarana
    Katsamanis, Nassos
    [J]. ERCIM NEWS, 2024, (136): : 12 - 13
  • [4] Large Language Models
    Cerf, Vinton G.
    [J]. COMMUNICATIONS OF THE ACM, 2023, 66 (08) : 7 - 7
  • [5] Large Language Models in der WissenschaftLarge language models in science
    Karl-Friedrich Kowalewski
    Severin Rodler
    [J]. Die Urologie, 2024, 63 (9) : 860 - 866
  • [6] The Importance of Understanding Language in Large Language Models
    Youssef, Alaa
    Stein, Samantha
    Clapp, Justin
    Magnus, David
    [J]. AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 6 - 7
  • [7] Dissociating language and thought in large language models
    Mahowald, Kyle
    Ivanova, Anna A.
    Blank, Idan A.
    Kanwisher, Nancy
    Tenenbaum, Joshua B.
    Fedorenko, Evelina
    [J]. TRENDS IN COGNITIVE SCIENCES, 2024, 28 (06) : 517 - 540
  • [8] Imitation and Large Language Models
    Boisseau, Éloïse
    [J]. Minds and Machines, 2024, 34 (04)
  • [9] Autoformalization with Large Language Models
    Wu, Yuhuai
    Jiang, Albert Q.
    Li, Wenda
    Rabe, Markus N.
    Staats, Charles
    Jamnik, Mateja
    Szegedy, Christian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] The Smallness of Large Language Models
    Denning, Peter J.
    [J]. COMMUNICATIONS OF THE ACM, 2023, 66 (09) : 24 - 27