Birth of a Transformer: A Memory Viewpoint

被引：0

作者：

Bietti, Alberto ^{[1
,2
]}

Cabannes, Vivien ^{[2
]}

Bouchacourt, Diane ^{[2
]}

Jegou, Herve ^{[2
]}

Bottou, Leon ^{[2
]}

机构：

[1] Flatiron Inst, New York, NY 10010 USA

[2] Meta, FAIR, Menlo Pk, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We study how transformers balance these two types of knowledge by considering a synthetic setup where tokens are generated from either global or context-specific bigram distributions. By a careful empirical analysis of the training process on a simplified two-layer transformer, we illustrate the fast learning of global bigrams and the slower development of an "induction head" mechanism for the in-context bigrams. We highlight the role of weight matrices as associative memories, provide theoretical insights on how gradients enable their learning during training, and study the role of data-distributional properties.

引用

页数：29

共 50 条

[1] birth of biomedicine, from the viewpoint of an historian
Picard, JF
M S-MEDECINE SCIENCES, 1996, 12 (01): : 97 - 101
[2] A Viewpoint: A Memory Safety Manifesto
Wallach, Dan S.
Lord, Bob
IEEE SECURITY & PRIVACY, 2024, 22 (04) : 18 - 21
[3] BRAID TRANSFORMER MEMORY
ALDRICH, WH
ALONSO, RL
IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1966, EC15 (04): : 502 - +
[4] Transformer with Memory Replay
Liu, Rui
Mozafari, Barzan
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7567 - 7575
[5] Recurrent Memory Transformer
Bulatov, Aydar
Kuratov, Yuri
Burtsev, Mikhail S.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[6] The birth of a memory
Otten, LJ
Rugg, MD
TRENDS IN NEUROSCIENCES, 2002, 25 (06) : 279 - 281
[7] ∞-former: Infinite Memory Transformer
Martins, Pedro Henrique
Marinho, Zita
Martins, Andre F. T.
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5468 - 5485
[8] DEVELOPING HIGH-RELIABILITY TRANSFORMER COMPONENTS A MANUFACTURERS VIEWPOINT
RAHEJA, D
MUENCH, FJ
MCNULTY, WJ
PROCEEDINGS ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 1980, (NSYM): : 442 - 447
[9] VISUAL MEMORY AT BIRTH
SLATER, A
MORISON, V
ROSE, D
BRITISH JOURNAL OF PSYCHOLOGY, 1982, 73 (NOV) : 519 - 525
[10] Response:: The birth of a memory
Fernández, G
Fell, J
Fries, P
TRENDS IN NEUROSCIENCES, 2002, 25 (06) : 281 - 282

← 1 2 3 4 5 →