General-purpose, long-context autoregressive modeling with Perceiver AR

被引:0
|
作者
Hawthorne, Curtis [1 ]
Jaegle, Andrew [2 ]
Cangea, Catalina [2 ]
Borgeaud, Sebastian [2 ]
Nash, Charlie [2 ]
Malinowski, Mateusz [2 ]
Dieleman, Sander [2 ]
Vinyals, Oriol [2 ]
Botvinick, Matthew [2 ]
Simon, Ian [1 ]
Sheahan, Hannah [2 ]
Zeghidour, Neil [1 ]
Alayrac, Jean-Baptiste [2 ]
Carreira, Joao [2 ]
Engel, Jesse [1 ]
机构
[1] Google Res, Brain Team, Mountain View, CA 94043 USA
[2] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64 x 64 ImageNet images and PG-19 books.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] A DYNAMIC-SYSTEM-BASED APPROACH TO MODELING DRIVER MOVEMENTS ACROSS GENERAL-PURPOSE/MANAGED LANE INTERFACES
    Wright, Matthew A.
    Horowitz, Roberto
    Kurzhanskiy, Alex A.
    PROCEEDINGS OF THE ASME 11TH ANNUAL DYNAMIC SYSTEMS AND CONTROL CONFERENCE, 2018, VOL 2, 2018,
  • [42] Modeling and Analyzing the Strategy Game "Factorio" Using Modular Petri Nets and the General-Purpose Petri Net Simulator
    Chandler, Benjamin Alexander
    Davidrajuh, Reggie
    ELECTRONICS, 2024, 13 (07)
  • [43] Proposal to use LHC general-purpose detectors in "beam-dump" measurements for long-lived particles
    Dutta, Bhaskar
    Kim, Doojin
    Kim, Hyunyong
    PHYSICS LETTERS B, 2025, 861
  • [44] A general-purpose tool for modeling multifunctional thin porous media (POREnet): From pore network to effective property tensors
    Garcia-Salaberri, Pablo A.
    Zenyuk, Iryna V.
    HELIYON, 2024, 10 (04)
  • [46] Modeling Dynamics of Vehicle-Based Performance Measures of High-Occupancy Vehicle and General-Purpose Traffic Systems
    Mulokozi, Eneliko
    Teng, Hualiang
    Kwigizile, Valerian
    Chimba, Deo
    Sando, Thobias
    JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2018, 144 (02)
  • [47] Modeling airside airport operations using general-purpose, activity-based, discrete-event simulation tools
    Martinez, JC
    Trani, AA
    Ioannou, PG
    ISSUES IN AVIATION: AIRPORTS, CAPACITY, AND AIR TRAFFIC CONTROL AND MANAGEMENT: AVIATION, 2001, (1744): : 65 - 71
  • [48] Performance of 3D Wave Field Modeling Using the Staggered Grid Finite Difference Method with General-Purpose Processors
    Franczyk, Anna
    Gwizdz, Damian
    Lesniak, Andrzej
    ENERGIES, 2020, 13 (17)
  • [49] General-purpose numerical deposition modeling methodology based on mesh geometry reconstruction strategy in cold spray additive manufacturing system
    Li, Wenbo
    Wu, Hongjian
    Sokore, Mohamed
    Raoelison, Rija Nirina
    Liao, Hanlin
    Costil, Sophie
    Deng, Sihao
    SURFACE & COATINGS TECHNOLOGY, 2023, 464
  • [50] Advanced FE nonlinear numerical modeling to predict historical masonry vaults failure: Assessment of risk collapse for a long span cloister vault heavily loaded at the crown by means of a general-purpose numerical protocol
    Pingaro, Natalia
    Buzzetti, Martina
    Milani, Gabriele
    ENGINEERING FAILURE ANALYSIS, 2025, 167