Birth of a Transformer: A Memory Viewpoint

被引:0
|
作者
Bietti, Alberto [1 ,2 ]
Cabannes, Vivien [2 ]
Bouchacourt, Diane [2 ]
Jegou, Herve [2 ]
Bottou, Leon [2 ]
机构
[1] Flatiron Inst, New York, NY 10010 USA
[2] Meta, FAIR, Menlo Pk, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We study how transformers balance these two types of knowledge by considering a synthetic setup where tokens are generated from either global or context-specific bigram distributions. By a careful empirical analysis of the training process on a simplified two-layer transformer, we illustrate the fast learning of global bigrams and the slower development of an "induction head" mechanism for the in-context bigrams. We highlight the role of weight matrices as associative memories, provide theoretical insights on how gradients enable their learning during training, and study the role of data-distributional properties.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] birth of biomedicine, from the viewpoint of an historian
    Picard, JF
    M S-MEDECINE SCIENCES, 1996, 12 (01): : 97 - 101
  • [2] A Viewpoint: A Memory Safety Manifesto
    Wallach, Dan S.
    Lord, Bob
    IEEE SECURITY & PRIVACY, 2024, 22 (04) : 18 - 21
  • [3] BRAID TRANSFORMER MEMORY
    ALDRICH, WH
    ALONSO, RL
    IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1966, EC15 (04): : 502 - +
  • [4] Transformer with Memory Replay
    Liu, Rui
    Mozafari, Barzan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7567 - 7575
  • [5] Recurrent Memory Transformer
    Bulatov, Aydar
    Kuratov, Yuri
    Burtsev, Mikhail S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [6] The birth of a memory
    Otten, LJ
    Rugg, MD
    TRENDS IN NEUROSCIENCES, 2002, 25 (06) : 279 - 281
  • [7] ∞-former: Infinite Memory Transformer
    Martins, Pedro Henrique
    Marinho, Zita
    Martins, Andre F. T.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5468 - 5485
  • [8] DEVELOPING HIGH-RELIABILITY TRANSFORMER COMPONENTS A MANUFACTURERS VIEWPOINT
    RAHEJA, D
    MUENCH, FJ
    MCNULTY, WJ
    PROCEEDINGS ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 1980, (NSYM): : 442 - 447
  • [9] VISUAL MEMORY AT BIRTH
    SLATER, A
    MORISON, V
    ROSE, D
    BRITISH JOURNAL OF PSYCHOLOGY, 1982, 73 (NOV) : 519 - 525
  • [10] Response:: The birth of a memory
    Fernández, G
    Fell, J
    Fries, P
    TRENDS IN NEUROSCIENCES, 2002, 25 (06) : 281 - 282