General-purpose, long-context autoregressive modeling with Perceiver AR

被引:0
|
作者
Hawthorne, Curtis [1 ]
Jaegle, Andrew [2 ]
Cangea, Catalina [2 ]
Borgeaud, Sebastian [2 ]
Nash, Charlie [2 ]
Malinowski, Mateusz [2 ]
Dieleman, Sander [2 ]
Vinyals, Oriol [2 ]
Botvinick, Matthew [2 ]
Simon, Ian [1 ]
Sheahan, Hannah [2 ]
Zeghidour, Neil [1 ]
Alayrac, Jean-Baptiste [2 ]
Carreira, Joao [2 ]
Engel, Jesse [1 ]
机构
[1] Google Res, Brain Team, Mountain View, CA 94043 USA
[2] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64 x 64 ImageNet images and PG-19 books.
引用
收藏
页数:24
相关论文
共 49 条
  • [1] A General-Purpose Context Modeling Architecture for Adaptive Mobile Services
    Pederson, Thomas
    Ardito, Carmelo
    Bottoni, Paolo
    Costabile, Maria Francesca
    [J]. ADVANCES IN CONCEPTUAL MODELING - CHALLENGES AND OPPORTUNITIES, 2008, 5232 : 208 - +
  • [2] General-Purpose Modeling Tool
    Rujevcic, Renato
    Penco, Roberto
    [J]. 2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 1289 - 1294
  • [3] A Context Manager for General-purpose Operating Systems
    Olsen, Diogo
    Maziero, Carlos
    [J]. 2012 BRAZILIAN SYMPOSIUM ON COMPUTING SYSTEM ENGINEERING (SBESC 2012), 2012, : 157 - 160
  • [4] General-Purpose User Modeling with Behavioral Logs
    Fang, Qixiang
    Zhou, Zhihan
    Barbieri, Francesco
    Liu, Yozen
    Neves, Leonardo
    Nguyen, Dong
    Oberski, Daniel
    Bos, Maarten
    Dotsch, Ron
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2431 - 2436
  • [5] STATISTICAL ISSUES IN A GENERAL-PURPOSE SIMULATION MODELING LANGUAGE
    ROBERTS, SD
    KLEIN, RW
    [J]. 1989 WINTER SIMULATION CONFERENCE PROCEEDINGS, 1989, : 325 - 333
  • [6] VISUALIZATION AND MODELING OF STEREOIMAGES ON THE BASIS OF A GENERAL-PURPOSE COMPUTER
    PANOV, YA
    [J]. SOVIET JOURNAL OF OPTICAL TECHNOLOGY, 1991, 58 (04): : 257 - 258
  • [7] A GENERAL-PURPOSE GRAPH DYNAMICAL SYSTEM MODELING FRAMEWORK
    Kuhlman, Chris J.
    Kumar, V. S. Anil
    Marathe, Madhav V.
    Mortveit, Henning S.
    Swarup, Samarth
    Tuli, Gaurav
    Ravi, S. S.
    Rosenkrantz, Daniel J.
    [J]. PROCEEDINGS OF THE 2011 WINTER SIMULATION CONFERENCE (WSC), 2011, : 296 - 308
  • [8] GENERAL-PURPOSE SIMULATION TECHNIQUE FOR MODELING MILITARY OPERATIONS
    KORNBLUH, M
    MERIKALL.RA
    OHARA, JE
    [J]. OPERATIONS RESEARCH, 1964, 12 : B42 - &
  • [9] Modeling in the bioimpedance measurement techniques using general-purpose software
    Paavle, Toivo
    [J]. 2006 INTERNATIONAL BALTIC ELECTRONICS CONFERENCE, PROCEEDINGS, 2006, : 209 - 212
  • [10] MODELING 8-BIT MICROPROCESSORS FOR A GENERAL-PURPOSE SIMULATOR
    WINDER, R
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 1988, 12 (08) : 443 - 453