Math-LLMs: AI Cyberinfrastructure with Pre-trained Transformers for Math Education

被引：3

作者：

Zhang, Fan ^{[1
]}

Li, Chenglu ^{[2
]}

Henkel, Owen ^{[3
]}

Xing, Wanli ^{[1
]}

Baral, Sami ^{[4
]}

Heffernan, Neil ^{[4
]}

Li, Hai ^{[1
]}

机构：

[1] Univ Florida, Gainesville, FL 32611 USA

[2] Univ Utah, Salt Lake City, UT USA

[3] Rising Acad Network, Freetown, Sierra Leone

[4] Worcester Polytech Inst, Worcester, MA USA

来源：

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION | 2024年

关键词：

LLMs; Math education; Pre-train; MATHEMATICS;

D O I：

10.1007/s40593-024-00416-y

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In recent years, the pre-training of Large Language Models (LLMs) in the educational domain has garnered significant attention. However, a discernible gap exists in the application of these models to mathematics education. This study aims to bridge this gap by pre-training LLMs on authentic K-12 mathematical dialogue datasets. Our research is structured around three primary research questions (RQs) that investigate the impact of fine-tuning data size and pre-training in downstream Natural Language Processing (NLP) tasks, and the efficacy of LLMs in text generation tasks within the mathematical context. Our findings indicate that data size plays a pivotal role in the performance of LLMs in downstream NLP tasks, with larger datasets yielding more consistent and improved results. Furthermore, pre-trained models consistently outperformed their non-pre-trained counterparts, emphasizing the importance of leveraging prior knowledge in LLMs. In the realm of text generation, we found that our model can not only enhance mathematical understanding and performance on downstream math tasks but also generate more engaging and human-like language.

引用

页数：24

共 50 条

[1] Improving Math Word Problems with Pre-trained Knowledge and Hierarchical Reasoning
Yu, Weijiang
Wen, Yingpeng
Zheng, Fudan
Xiao, Nong
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3384 - 3394
[2] Are Pre-trained Convolutions Better than Pre-trained Transformers?
Tay, Yi
Dehghani, Mostafa
Gupta, Jai
Aribandi, Vamsi
Bahri, Dara
Qin, Zhen
Metzler, Donald
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
[3] Calibration of Pre-trained Transformers
Desai, Shrey
Durrett, Greg
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 295 - 302
[4] Classifying Math Knowledge Components via Task-Adaptive Pre-Trained BERT
Shen, Jia Tracy
Yamashita, Michiharu
Prihar, Ethan
Heffernan, Neil
Wu, Xintao
McGrew, Sean
Lee, Dongwon
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT I, 2021, 12748 : 408 - 419
[5] Emergent Modularity in Pre-trained Transformers
Zhang, Zhengyan
Zeng, Zhiyuan
Lin, Yankai
Xiao, Chaojun
Wang, Xiaozhi
Han, Xu
Liu, Zhiyuan
Xie, Ruobing
Sun, Maosong
Zhou, Jie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4066 - 4083
[6] Pre-trained transformers: an empirical comparison
Casola, Silvia
Lauriola, Ivano
Lavelli, Alberto
MACHINE LEARNING WITH APPLICATIONS, 2022, 9
[7] Face Inpainting with Pre-trained Image Transformers
Gonc, Kaan
Saglam, Baturay
Kozat, Suleyman S.
Dibeklioglu, Hamdi
2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
[8] Can LLMs Facilitate Interpretation of Pre-trained Language Models?
Mousi, Basel
Durrani, Nadir
Dalvi, Fahim
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3248 - 3268
[9] How Different are Pre-trained Transformers for Text Ranking?
Rau, David
Kamps, Jaap
ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 207 - 214
[10] Efficient feature selection for pre-trained vision transformers
Huang, Lan
Zeng, Jia
Yu, Mengqiang
Ding, Weiping
Bai, Xingyu
Wang, Kangping
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 254

← 1 2 3 4 5 →