A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models

被引:0
|
作者
Stolfo, Alessandro [1 ]
Jin, Zhijing [1 ,2 ]
Shridhar, Kumar [1 ]
Scholkopf, Bernhard [1 ,2 ]
Sachan, Mrinmaya [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] MPI, Zurich, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We have recently witnessed a number of impressive results on hard mathematical reasoning problems with language models. At the same time, the robustness of these models has also been called into question; recent works have shown that models can rely on shallow patterns in the problem description when generating a solution. Building on the idea of behavioral testing, we propose a novel framework, which pins down the causal effect of various factors in the input, e.g., the surface form of the problem text, the operands, and math operators on the output solution. By grounding the behavioral analysis in a causal graph describing an intuitive reasoning process, we study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space. We apply our framework on a test bed of math word problems. Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.(1)
引用
收藏
页码:545 / 561
页数:17
相关论文
共 50 条
  • [1] CLADDER: Assessing Causal Reasoning in Language Models
    Jin, Zhijing
    Chen, Yuen
    Leeb, Felix
    Gresele, Luigi
    Kamal, Ojasv
    Lyu, Zhiheng
    Blin, Kevin
    Gonzalez, Fernando
    Kleiman-Weiner, Max
    Sachan, Mrinmaya
    Schoelkopf, Bernhard
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] konfound: Command to quantify robustness of causal inferences
    Xu, Ran
    Frank, Kenneth A.
    Maroulis, Spiro J.
    Rosenberg, Joshua M.
    [J]. STATA JOURNAL, 2019, 19 (03): : 523 - 550
  • [3] Distilling mathematical reasoning capabilities into Small Language Models
    Zhu, Xunyu
    Li, Jian
    Liu, Yong
    Ma, Can
    Wang, Weiping
    [J]. NEURAL NETWORKS, 2024, 179
  • [4] The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
    Shen, Ke
    [J]. THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23419 - 23420
  • [5] Are Large Language Models Capable of Causal Reasoning for Sensing Data Analysis?
    Hu, Zhizhang
    Zhang, Yue
    Rossi, Ryan
    Yu, Tong
    Kim, Sungchul
    Pan, Shijia
    [J]. PROCEEDINGS OF THE 2024 WORKSHOP ON EDGE AND MOBILE FOUNDATION MODELS, EDGEFM 2024, 2024, : 24 - 29
  • [6] Causal reasoning with mental models
    Khemlani, Sangeet S.
    Barbey, Aron K.
    Johnson-Laird, Philip N.
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2014, 8
  • [7] MATHEMATICAL REASONING AND STRUCTURE OF LANGUAGE
    CORCORAN, J
    [J]. JOURNAL OF STRUCTURAL LEARNING, 1976, 5 (3-4): : 179 - 193
  • [8] The independence of language and mathematical reasoning
    Brannon, EM
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (09) : 3177 - 3178
  • [9] A causal framework for integrating learning and reasoning
    Lagnado, David A.
    [J]. BEHAVIORAL AND BRAIN SCIENCES, 2009, 32 (02) : 211 - +
  • [10] The mental representation of causal conditional reasoning: Mental models or causal models
    Ali, Nilufa
    Chater, Nick
    Oaksford, Mike
    [J]. COGNITION, 2011, 119 (03) : 403 - 418