Math-LLMs: AI Cyberinfrastructure with Pre-trained Transformers for Math Education

被引：3

作者：

Zhang, Fan ^{[1
]}

Li, Chenglu ^{[2
]}

Henkel, Owen ^{[3
]}

Xing, Wanli ^{[1
]}

Baral, Sami ^{[4
]}

Heffernan, Neil ^{[4
]}

Li, Hai ^{[1
]}

机构：

[1] Univ Florida, Gainesville, FL 32611 USA

[2] Univ Utah, Salt Lake City, UT USA

[3] Rising Acad Network, Freetown, Sierra Leone

[4] Worcester Polytech Inst, Worcester, MA USA

来源：

INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION | 2024年

关键词：

LLMs; Math education; Pre-train; MATHEMATICS;

D O I：

10.1007/s40593-024-00416-y

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In recent years, the pre-training of Large Language Models (LLMs) in the educational domain has garnered significant attention. However, a discernible gap exists in the application of these models to mathematics education. This study aims to bridge this gap by pre-training LLMs on authentic K-12 mathematical dialogue datasets. Our research is structured around three primary research questions (RQs) that investigate the impact of fine-tuning data size and pre-training in downstream Natural Language Processing (NLP) tasks, and the efficacy of LLMs in text generation tasks within the mathematical context. Our findings indicate that data size plays a pivotal role in the performance of LLMs in downstream NLP tasks, with larger datasets yielding more consistent and improved results. Furthermore, pre-trained models consistently outperformed their non-pre-trained counterparts, emphasizing the importance of leveraging prior knowledge in LLMs. In the realm of text generation, we found that our model can not only enhance mathematical understanding and performance on downstream math tasks but also generate more engaging and human-like language.

引用

页数：24

共 50 条

[21] Sparse Pairwise Re-ranking with Pre-trained Transformers
Gienapp, Lukas
Froebe, Maik
Hagen, Matthias
Potthast, Martin
PROCEEDINGS OF THE 2022 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2022, 2022, : 250 - 258
[22] DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Cao, Qingqing
Trivedi, Harsh
Balasubramanian, Aruna
Balasubramanian, Niranjan
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4487 - 4497
[23] Routing Generative Pre-Trained Transformers for Printed Circuit Board
Wang, Hao
Tu, Jun
Bai, Shenglong
Zheng, Jie
Qian, Weikang
Chen, Jienan
2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 160 - 165
[24] Towards Summarizing Code Snippets Using Pre-Trained Transformers
Mastropaolo, Antonio
Tufano, Rosalia
Ciniselli, Matteo
Aghajani, Emad
Pascarella, Luca
Bavota, Gabriele
arXiv, 1600,
[25] Investor's ESG tendency probed by pre-trained transformers
Li, Chao
Keeley, Alexander Ryota
Takeda, Shutaro
Seki, Daikichi
Managi, Shunsuke
CORPORATE SOCIAL RESPONSIBILITY AND ENVIRONMENTAL MANAGEMENT, 2025, 32 (02) : 2051 - 2071
[26] TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter
Gonzalez, Jose Angel
Hurtado, Lluis-F.
Pla, Ferran
NEUROCOMPUTING, 2021, 426 : 58 - 69
[27] Causal Interpretation of Self-Attention in Pre-Trained Transformers
Rohekar, Raanan Y.
Gurwicz, Yaniv
Nisimov, Shami
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[28] An Empirical Study of Pre-trained Transformers for Arabic Information Extraction
Lan, Wuwei
Chen, Yang
Xu, Wei
Ritter, Alan
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4727 - 4734
[29] Handwritten Document Recognition Using Pre-trained Vision Transformers
Parres, Daniel
Anitei, Dan
Paredes, Roberto
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 173 - 190
[30] Experiments in News Bias Detection with Pre-trained Neural Transformers
Menzner, Tim
Leidner, Jochen L.
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT IV, 2024, 14611 : 270 - 284

← 1 2 3 4 5 →