How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

被引:0
|
作者
Hanna, Michael [1 ]
Liu, Ollie [2 ]
Variengien, Alexandre [3 ]
机构
[1] Univ Amsterdam, ILLC, Amsterdam, Netherlands
[2] Univ Southern Calif, Los Angeles, CA 90007 USA
[3] Redwood Res, Redwood City, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained language models can be surprisingly adept at tasks they were not explicitly trained on, but how they implement these capabilities is poorly understood. In this paper, we investigate the basic mathematical abilities often acquired by pre-trained language models. Concretely, we use mechanistic interpretability techniques to explain the (limited) mathematical abilities of GPT-2 small. As a case study, we examine its ability to take in sentences such as "The war lasted from the year 1732 to the year 17", and predict valid two-digit end years (years > 32). We first identify a circuit, a small subset of GPT-2 small's computational graph that computes this task's output. Then, we explain the role of each circuit component, showing that GPT-2 small's final multi-layer perceptrons boost the probability of end years greater than the start year. Finally, we find related tasks that activate our circuit. Our results suggest that GPT-2 small computes greater-than using a complex mechanism that activates across diverse contexts.
引用
收藏
页数:28
相关论文
共 17 条
  • [1] Towards JavaScript program repair with Generative Pre-trained Transformer (GPT-2)
    Lajko, Mark
    Csuvik, Viktor
    Vidacs, Laszlo
    Proceedings - International Workshop on Automated Program Repair, APR 2022, 2022, : 61 - 68
  • [2] Towards Java']JavaScript program repair with Generative Pre-trained Transformer (GPT-2)
    Lajko, Mark
    Csuvik, Viktor
    Vidacs, Laszlo
    INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR (APR 2022), 2022, : 61 - 68
  • [3] JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding
    Zhao, Wayne Xin
    Zhou, Kun
    Gong, Zheng
    Zhang, Beichen
    Zhou, Yuanhang
    Sha, Jing
    Chen, Zhigang
    Wang, Shijin
    Liu, Cong
    Wen, Ji-Rong
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4571 - 4581
  • [4] Performance of the pre-trained large language model GPT-4 on automated short answer grading
    Kortemeyer G.
    Discover Artificial Intelligence, 2024, 4 (01):
  • [5] GPT-2C: A Parser for Honeypot Logs Using Large Pre-trained Language Models
    Setianto, Febrian
    Tsani, Erion
    Sadiq, Fatima
    Domalis, Georgios
    Tsakalidis, Dimitris
    Kostakos, Panos
    PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, : 649 - 653
  • [6] How large language models including generative pre-trained transformer (GPT) 3 and 4 will impact medicine and surgery
    S. B. Atallah
    N. R. Banda
    A. Banda
    N. A. Roeck
    Techniques in Coloproctology, 2023, 27 : 609 - 614
  • [7] How large language models including generative pre-trained transformer (GPT) 3 and 4 will impact medicine and surgery
    Atallah, S. B.
    Banda, N. R.
    Banda, A.
    Roeck, N. A.
    TECHNIQUES IN COLOPROCTOLOGY, 2023, 27 (08) : 609 - 614
  • [8] How Robust Is a Large Pre -trained Language Model for Code Generation? A Case on Attacking GPT2
    Zhu, Rui
    Zhang, Cunming
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 708 - 712
  • [9] JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving
    Zhao, Wayne Xin
    Zhou, Kun
    Zhang, Beichen
    Gong, Zheng
    Chen, Zhipeng
    Zhou, Yuanhang
    Wen, Ji-Rong
    Sha, Jing
    Wang, Shijin
    Liu, Cong
    Hu, Guoping
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5660 - 5672
  • [10] MetaQA: Enhancing human-centered data search using Generative Pre-trained Transformer (GPT) language model and artificial intelligence
    Li, Diya
    Zhang, Zhe
    PLOS ONE, 2023, 18 (11):