Evaluating the Performance of Large Language Models for Spanish Language in Undergraduate Admissions Exams

被引：0

作者：

Miranda, Sabino ^{[1
]}

Pichardo-Lagunas, Obdulia ^{[1
]}

Martinez-Seis, Bella ^{[1
]}

Baldi, Pierre ^{[2
]}

机构：

[1] Inst Politecn Nacl IPN, UPIITA, Mexico City, Mexico

[2] Univ Calif Irvine, Irvine, CA USA

来源：

COMPUTACION Y SISTEMAS | 2023年 / 27卷 / 04期

关键词：

Large Language Models; ChatGPT; BARD; Undergraduate Admissions Exams;

D O I：

10.13053/CyS-27-4-4790

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This study evaluates the performance of large language models, specifically GPT-3.5 and BARD (supported by Gemini Pro model), in undergraduate admissions exams proposed by the cover Engineering/Mathematical and Physical Sciences, Biological and Medical Sciences, and Social and Administrative Sciences. Both models demonstrated proficiency, exceeding the minimum acceptance scores academic programs. GPT-3.5 outperformed BARD in Mathematics and Physics, while BARD performed better Overall, GPT-3.5 marginally surpassed BARD with scores of 60.94% and 60.42%, respectively.

引用

页码：1241 / 1248

页数：8

共 50 条

[1] Evaluating the Performance of Artificial Intelligence Chatbots and Large Language Models in the FE and PE Structural Exams
Naser, M. Z.
Ross, Brandon
Ogle, Jennifer
Kodur, Venkatesh
Hawileh, Rami
Abdalla, Jamal
Thai, Huu-Tai
[J]. PRACTICE PERIODICAL ON STRUCTURAL DESIGN AND CONSTRUCTION, 2024, 29 (02)
[2] MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish
Garrido-Munoz, Ismael
Martinez-Santiago, Fernando
Montejo-Raez, Arturo
[J]. LANGUAGE RESOURCES AND EVALUATION, 2023,
[3] A bilingual benchmark for evaluating large language models
Alkaoud, Mohamed
[J]. PEERJ COMPUTER SCIENCE, 2024, 10
[4] Evaluating large language models for annotating proteins
Vitale, Rosario
Bugnon, Leandro A.
Fenoy, Emilio Luis
Milone, Diego H.
Stegmayer, Georgina
[J]. BRIEFINGS IN BIOINFORMATICS, 2024, 25 (03)
[5] Evaluating large language models as agents in the clinic
Nikita Mehandru
Brenda Y. Miao
Eduardo Rodriguez Almaraz
Madhumita Sushil
Atul J. Butte
Ahmed Alaa
[J]. npj Digital Medicine, 7
[6] Evaluating large language models as agents in the clinic
Mehandru, Nikita
Miao, Brenda Y.
Almaraz, Eduardo Rodriguez
Sushil, Madhumita
Butte, Atul J.
Alaa, Ahmed
[J]. NPJ DIGITAL MEDICINE, 2024, 7 (01)
[7] Evaluating Intelligence and Knowledge in Large Language Models
Bianchini, Francesco
[J]. TOPOI-AN INTERNATIONAL REVIEW OF PHILOSOPHY, 2024,
[8] Evaluating the Diagnostic Performance of Large Language Models on Complex Multimodal Medical Cases
Chiu, Wan Hang Keith
Ko, Wei Sum Koel
Cho, William Chi Shing
Hui, Sin Yu Joanne
Chan, Wing Chi Lawrence
Kuo, Michael D.
[J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[9] Representations of the Spanish language by Language and Literature's undergraduate students
Santos, Poliana de Oliveira
[J]. CARACOL, 2020, (19) : 766 - 792
[10] Baby steps in evaluating the capacities of large language models
Frank, Michael C.
[J]. NATURE REVIEWS PSYCHOLOGY, 2023, 2 (08): : 451 - 452

← 1 2 3 4 5 →