General-purpose, long-context autoregressive modeling with Perceiver AR

被引:0
|
作者
Hawthorne, Curtis [1 ]
Jaegle, Andrew [2 ]
Cangea, Catalina [2 ]
Borgeaud, Sebastian [2 ]
Nash, Charlie [2 ]
Malinowski, Mateusz [2 ]
Dieleman, Sander [2 ]
Vinyals, Oriol [2 ]
Botvinick, Matthew [2 ]
Simon, Ian [1 ]
Sheahan, Hannah [2 ]
Zeghidour, Neil [1 ]
Alayrac, Jean-Baptiste [2 ]
Carreira, Joao [2 ]
Engel, Jesse [1 ]
机构
[1] Google Res, Brain Team, Mountain View, CA 94043 USA
[2] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-world data is high-dimensional: a book, image, or musical performance can easily contain hundreds of thousands of elements even after compression. However, the most commonly used autoregressive models, Transformers, are prohibitively expensive to scale to the number of inputs and layers needed to capture this long-range structure. We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms. When trained on images or music, Perceiver AR generates outputs with clear long-term coherence and structure. Our architecture also obtains state-of-the-art likelihood on long-sequence benchmarks, including 64 x 64 ImageNet images and PG-19 books.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] A GENERAL-PURPOSE SIMULATOR FOR DYNAMIC MODELING OF WASTE-WATER TREATMENT PLANTS
    PATRY, GG
    TAKACS, I
    YONG, YW
    PROCEEDINGS OF THE 1989 SUMMER COMPUTER SIMULATION CONFERENCE, 1989, : 288 - 291
  • [22] Analysis of impact of general-purpose graphics processor units in supersonic flow modeling
    Emelyanov, V. N.
    Karpenko, A. G.
    Kozelkov, A. S.
    Teterina, I. V.
    Volkov, K. N.
    Yalozo, A. V.
    ACTA ASTRONAUTICA, 2017, 135 : 198 - 207
  • [23] A GENERAL-PURPOSE DYNAMIC PHANTOM FOR MODELING CARDIAC ACTION IN RADIONUCLIDE VENTRICULOGRAPHY AND ANGIOCARDIOGRAPHY
    ULLMANN, V
    KUBA, J
    PHYSICS IN MEDICINE AND BIOLOGY, 1986, 31 (06): : 669 - 675
  • [24] A comprehensive workflow for general-purpose neural modeling with highly configurable neuromorphic hardware systems
    Bruederle, Daniel
    Petrovici, Mihai A.
    Vogginger, Bernhard
    Ehrlich, Matthias
    Pfeil, Thomas
    Millner, Sebastian
    Gruebl, Andreas
    Wendt, Karsten
    Mueller, Eric
    Schwartz, Marc-Olivier
    de Oliveira, Dan Husmann
    Jeltsch, Sebastian
    Fieres, Johannes
    Schilling, Moritz
    Mueller, Paul
    Breitwieser, Oliver
    Petkov, Venelin
    Muller, Lyle
    Davison, Andrew P.
    Krishnamurthy, Pradeep
    Kremkow, Jens
    Lundqvist, Mikael
    Muller, Eilif
    Partzsch, Johannes
    Scholze, Stefan
    Zuehl, Lukas
    Mayr, Christian
    Destexhe, Alain
    Diesmann, Markus
    Potjans, Tobias C.
    Lansner, Anders
    Schueffny, Rene
    Schemmel, Johannes
    Meier, Karlheinz
    BIOLOGICAL CYBERNETICS, 2011, 104 (4-5) : 263 - 296
  • [25] Automatic liver segmentation in computed tomography using general-purpose shape modeling methods
    Spinczyk, Dominik
    Krason, Agata
    BIOMEDICAL ENGINEERING ONLINE, 2018, 17
  • [26] DZip: improved general-purpose lossless compression based on novel neural network modeling
    Goyal, Mohit
    Tatwawadi, Kedar
    Chandak, Shubham
    Ochoa, Idoia
    2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, : 153 - 162
  • [27] Quantitative Structure Retention-Relationship Modeling: Towards an Innovative General-Purpose Strategy
    Kumari, Priyanka
    Van Laethem, Thomas
    Hubert, Philippe
    Fillet, Marianne
    Sacre, Pierre-Yves
    Hubert, Cedric
    MOLECULES, 2023, 28 (04):
  • [28] Development and validation of a general-purpose ReaxFF reactive force field for earth material modeling
    Zhang, Yingchun
    Liu, Xiandong
    van Duin, Adri C. T.
    Lu, Xiancai
    Meijer, Evert Jan
    JOURNAL OF CHEMICAL PHYSICS, 2024, 160 (09):
  • [29] Empowering General-purpose User Representation with Full-life Cycle Behavior Modeling
    Yang, Bei
    Gu, Jie
    Liu, Ke
    Xu, Xiaoxiao
    Xu, Renjun
    Sun, Qinghui
    Liu, Hong
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2908 - 2917
  • [30] Automatic liver segmentation in computed tomography using general-purpose shape modeling methods
    Dominik Spinczyk
    Agata Krasoń
    BioMedical Engineering OnLine, 17