General-purpose, long-context autoregressive modeling with Perceiver AR

被引:0
|
作者
Hawthorne, Curtis [1 ]
Jaegle, Andrew [2 ]
Cangea, Catalina [2 ]
Borgeaud, Sebastian [2 ]
Nash, Charlie [2 ]
Malinowski, Mateusz [2 ]
Dieleman, Sander [2 ]
Vinyals, Oriol [2 ]
Botvinick, Matthew [2 ]
Simon, Ian [1 ]
Sheahan, Hannah [2 ]
Zeghidour, Neil [1 ]
Alayrac, Jean-Baptiste [2 ]
Carreira, Joao [2 ]
Engel, Jesse [1 ]
机构
[1] Google Res, Brain Team, Mountain View, CA 94043 USA
[2] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64 x 64 ImageNet images and PG-19 books.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
    Niizumi, Daisuke
    Takeuchi, Daiki
    Ohishi, Yasunori
    Harada, Noboru
    Kashino, Kunio
    HEAR: HOLISTIC EVALUATION OF AUDIO REPRESENTATIONS, VOL 166, 2021, 166 : 1 - 24
  • [32] A comprehensive workflow for general-purpose neural modeling with highly configurable neuromorphic hardware systems
    Daniel Brüderle
    Mihai A. Petrovici
    Bernhard Vogginger
    Matthias Ehrlich
    Thomas Pfeil
    Sebastian Millner
    Andreas Grübl
    Karsten Wendt
    Eric Müller
    Marc-Olivier Schwartz
    Dan Husmann de Oliveira
    Sebastian Jeltsch
    Johannes Fieres
    Moritz Schilling
    Paul Müller
    Oliver Breitwieser
    Venelin Petkov
    Lyle Muller
    Andrew P. Davison
    Pradeep Krishnamurthy
    Jens Kremkow
    Mikael Lundqvist
    Eilif Muller
    Johannes Partzsch
    Stefan Scholze
    Lukas Zühl
    Christian Mayr
    Alain Destexhe
    Markus Diesmann
    Tobias C. Potjans
    Anders Lansner
    René Schüffny
    Johannes Schemmel
    Karlheinz Meier
    Biological Cybernetics, 2011, 104 : 263 - 296
  • [33] DZip: improved general-purpose lossless compression based on novel neural network modeling
    Goyal, Mohit
    Tatwawadi, Kedar
    Chandak, Shubham
    Ochoa, Idoia
    2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 372 - 372
  • [34] Extending a general-purpose algebraic modeling language to combinatorial optimization: A logic programming approach
    Fourer, R
    ADVANCES IN COMPUTATIONAL AND STOCHASTIC OPTIMIZATION, LOGIC PROGRAMMING, AND HEURISTIC SEARCH: INTERFACES IN COMPUTER SCIENCE AND OPERATIONS RESEARCH, 1998, : 31 - 74
  • [35] A SIMULATION STUDY OF THE ASTER SENSOR USING A VERSATILE GENERAL-PURPOSE RIGID SENSOR MODELING SYSTEM
    ONEILL, MA
    DOWMAN, IJ
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 1993, 14 (03) : 565 - 582
  • [36] Evaluation of general-purpose construction simulation and visualization tools for modeling and animating AirSide airport operations
    Khoury, Hiam M.
    Kamat, Vineet R.
    Ioannou, Photios G.
    SIMULATION-TRANSACTIONS OF THE SOCIETY FOR MODELING AND SIMULATION INTERNATIONAL, 2007, 83 (09): : 663 - 679
  • [37] Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
    Niizumi, Daisuke
    Takeuchi, Daiki
    Ohishi, Yasunori
    Harada, Noboru
    Kashino, Kunio
    INTERSPEECH 2023, 2023, : 1294 - 1298
  • [38] Comprehensive modeling and characterization of the General-Purpose Heat Source Radioisotope Thermoelectric Generator for solar system missions
    Tailin, Li
    Youhong, Liu
    Yingzeng, Zhang
    Haodong, Chen
    Qingpei, Xiang
    APPLIED THERMAL ENGINEERING, 2024, 248
  • [39] A GENERAL-PURPOSE SYSTEM FOR LONG-TERM RECORDING FROM A MICROELECTRODE ARRAY COUPLED TO EXCITABLE CELLS
    MARTINOIA, S
    BOVE, M
    CARLINI, G
    CICCARELLI, C
    GRATTAROLA, M
    STORMENT, C
    KOVACS, G
    JOURNAL OF NEUROSCIENCE METHODS, 1993, 48 (1-2) : 115 - 121
  • [40] 2-DIMENSIONAL MODELING OF SELF-ALIGNED SILICIDE PROCESSES WITH THE GENERAL-PURPOSE PROCESS SIMULATOR OPUS
    KAI, K
    KURODA, S
    NISHI, K
    IEICE TRANSACTIONS ON ELECTRONICS, 1994, E77C (02) : 129 - 133