Math-LLMs: AI Cyberinfrastructure with Pre-trained Transformers for Math Education

被引:3
|
作者
Zhang, Fan [1 ]
Li, Chenglu [2 ]
Henkel, Owen [3 ]
Xing, Wanli [1 ]
Baral, Sami [4 ]
Heffernan, Neil [4 ]
Li, Hai [1 ]
机构
[1] Univ Florida, Gainesville, FL 32611 USA
[2] Univ Utah, Salt Lake City, UT USA
[3] Rising Acad Network, Freetown, Sierra Leone
[4] Worcester Polytech Inst, Worcester, MA USA
关键词
LLMs; Math education; Pre-train; MATHEMATICS;
D O I
10.1007/s40593-024-00416-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In recent years, the pre-training of Large Language Models (LLMs) in the educational domain has garnered significant attention. However, a discernible gap exists in the application of these models to mathematics education. This study aims to bridge this gap by pre-training LLMs on authentic K-12 mathematical dialogue datasets. Our research is structured around three primary research questions (RQs) that investigate the impact of fine-tuning data size and pre-training in downstream Natural Language Processing (NLP) tasks, and the efficacy of LLMs in text generation tasks within the mathematical context. Our findings indicate that data size plays a pivotal role in the performance of LLMs in downstream NLP tasks, with larger datasets yielding more consistent and improved results. Furthermore, pre-trained models consistently outperformed their non-pre-trained counterparts, emphasizing the importance of leveraging prior knowledge in LLMs. In the realm of text generation, we found that our model can not only enhance mathematical understanding and performance on downstream math tasks but also generate more engaging and human-like language.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Improving Math Word Problems with Pre-trained Knowledge and Hierarchical Reasoning
    Yu, Weijiang
    Wen, Yingpeng
    Zheng, Fudan
    Xiao, Nong
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3384 - 3394
  • [2] Are Pre-trained Convolutions Better than Pre-trained Transformers?
    Tay, Yi
    Dehghani, Mostafa
    Gupta, Jai
    Aribandi, Vamsi
    Bahri, Dara
    Qin, Zhen
    Metzler, Donald
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
  • [3] Calibration of Pre-trained Transformers
    Desai, Shrey
    Durrett, Greg
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 295 - 302
  • [4] Classifying Math Knowledge Components via Task-Adaptive Pre-Trained BERT
    Shen, Jia Tracy
    Yamashita, Michiharu
    Prihar, Ethan
    Heffernan, Neil
    Wu, Xintao
    McGrew, Sean
    Lee, Dongwon
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT I, 2021, 12748 : 408 - 419
  • [5] Emergent Modularity in Pre-trained Transformers
    Zhang, Zhengyan
    Zeng, Zhiyuan
    Lin, Yankai
    Xiao, Chaojun
    Wang, Xiaozhi
    Han, Xu
    Liu, Zhiyuan
    Xie, Ruobing
    Sun, Maosong
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4066 - 4083
  • [6] Pre-trained transformers: an empirical comparison
    Casola, Silvia
    Lauriola, Ivano
    Lavelli, Alberto
    MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [7] Face Inpainting with Pre-trained Image Transformers
    Gonc, Kaan
    Saglam, Baturay
    Kozat, Suleyman S.
    Dibeklioglu, Hamdi
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [8] Can LLMs Facilitate Interpretation of Pre-trained Language Models?
    Mousi, Basel
    Durrani, Nadir
    Dalvi, Fahim
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3248 - 3268
  • [9] How Different are Pre-trained Transformers for Text Ranking?
    Rau, David
    Kamps, Jaap
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 207 - 214
  • [10] Efficient feature selection for pre-trained vision transformers
    Huang, Lan
    Zeng, Jia
    Yu, Mengqiang
    Ding, Weiping
    Bai, Xingyu
    Wang, Kangping
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 254