How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

被引：0

作者：

Hanna, Michael ^{[1
]}

Liu, Ollie ^{[2
]}

Variengien, Alexandre ^{[3
]}

机构：

[1] Univ Amsterdam, ILLC, Amsterdam, Netherlands

[2] Univ Southern Calif, Los Angeles, CA 90007 USA

[3] Redwood Res, Redwood City, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As a case study, we examine its ability to take in sentences such as "The war lasted from the year 1732 to the year 17", and predict valid two-digit end years (years > 32). We first identify a circuit, a small subset of GPT-2 small's computational graph that computes this task's output. Then, we explain the role of each circuit component, showing that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Finally, we find related tasks that activate our circuit. Our results suggest that GPT-2 small computes greater-than using a complex mechanism that activates across diverse contexts.

引用

页数：28

共 17 条

[1] Towards JavaScript program repair with Generative Pre-trained Transformer (GPT-2)
Lajko, Mark
Csuvik, Viktor
Vidacs, Laszlo
Proceedings - International Workshop on Automated Program Repair, APR 2022, 2022, : 61 - 68
[2] Towards Java']JavaScript program repair with Generative Pre-trained Transformer (GPT-2)
Lajko, Mark
Csuvik, Viktor
Vidacs, Laszlo
INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR (APR 2022), 2022, : 61 - 68
[3] JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding
Zhao, Wayne Xin
Zhou, Kun
Gong, Zheng
Zhang, Beichen
Zhou, Yuanhang
Sha, Jing
Chen, Zhigang
Wang, Shijin
Liu, Cong
Wen, Ji-Rong
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4571 - 4581
[4] Performance of the pre-trained large language model GPT-4 on automated short answer grading
Kortemeyer G.
Discover Artificial Intelligence, 2024, 4 (01):
[5] GPT-2C: A Parser for Honeypot Logs Using Large Pre-trained Language Models
Setianto, Febrian
Tsani, Erion
Sadiq, Fatima
Domalis, Georgios
Tsakalidis, Dimitris
Kostakos, Panos
PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, : 649 - 653
[6] How large language models including generative pre-trained transformer (GPT) 3 and 4 will impact medicine and surgery
S. B. Atallah
N. R. Banda
A. Banda
N. A. Roeck
Techniques in Coloproctology, 2023, 27 : 609 - 614
[7] How large language models including generative pre-trained transformer (GPT) 3 and 4 will impact medicine and surgery
Atallah, S. B.
Banda, N. R.
Banda, A.
Roeck, N. A.
TECHNIQUES IN COLOPROCTOLOGY, 2023, 27 (08) : 609 - 614
[8] How Robust Is a Large Pre -trained Language Model for Code Generation? A Case on Attacking GPT2
Zhu, Rui
Zhang, Cunming
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 708 - 712
[9] JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
Zhao, Wayne Xin
Zhou, Kun
Zhang, Beichen
Gong, Zheng
Chen, Zhipeng
Zhou, Yuanhang
Wen, Ji-Rong
Sha, Jing
Wang, Shijin
Liu, Cong
Hu, Guoping
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5660 - 5672
[10] MetaQA: Enhancing human-centered data search using Generative Pre-trained Transformer (GPT) language model and artificial intelligence
Li, Diya
Zhang, Zhe
PLOS ONE, 2023, 18 (11):

← 1 2 →