Establishing vocabulary tests as a benchmark for evaluating large language models

被引:0
|
作者
Martinez, Gonzalo [1 ]
Conde, Javier [2 ]
Merino-Gomez, Elena [3 ]
Bermudez-Margaretto, Beatriz [4 ]
Hernandez, Jose Alberto [1 ]
Reviriego, Pedro [2 ]
Brysbaert, Marc [5 ]
机构
[1] Univ Carlos III Madrid, Dept Ingn Telemat, Leganes, Spain
[2] Univ Politecn Madrid, ETSI Telecomunicac, Madrid, Spain
[3] Univ Valladolid, Escuela Ingn Ind, Valladolid, Spain
[4] Univ Salamanca, Dept Psicol Basica Psicobiol & Metodol Las CC Com, Salamanca, Spain
[5] Univ Ghent, Dept Expt Psychol, Ghent, Belgium
来源
PLOS ONE | 2024年 / 19卷 / 12期
关键词
WORD RECOGNITION; ACQUISITION; LEXTALE;
D O I
10.1371/journal.pone.0308259
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama 2, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect the fundamental linguistic aspects of language understanding. In this paper, we advocate for the revival of vocabulary tests as a valuable tool for assessing LLM performance. We evaluate seven LLMs using two vocabulary test formats across two languages and uncover surprising gaps in their lexical knowledge. These findings shed light on the intricacies of LLM word representations, their learning mechanisms, and performance variations across models and languages. Moreover, the ability to automatically generate and perform vocabulary tests offers new opportunities to expand the approach and provide a more complete picture of LLMs' language skills.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
    Gao, Dawei
    Wang, Haibin
    Li, Yaliang
    Sun, Xiuyu
    Qian, Yichen
    Ding, Bolin
    Zhou, Jingren
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (05): : 1132 - 1145
  • [42] Large vocabulary speech recognition of Slovenian language using morphological models
    Maucec, M
    Rotovnik, T
    Kacic, Z
    Horvat, B
    IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL, 2003, : 158 - 161
  • [43] Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
    Hoelscher-Obermaier, Jason
    Persson, Julia H.
    Kran, Esben
    Konstas, Ioannis
    Barez, Fazl
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11548 - 11559
  • [44] Evaluating large language models on medical evidence summarization
    Tang, Liyan
    Sun, Zhaoyi
    Idnay, Betina
    Nestor, Jordan G.
    Soroush, Ali
    Elias, Pierre A.
    Xu, Ziyang
    Ding, Ying
    Durrett, Greg
    Rousseau, Justin F.
    Weng, Chunhua
    Peng, Yifan
    NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [45] Methodological Challenges in Evaluating Large Language Models in Radiology
    Li, David
    Kim, Woojin
    Yi, Paul H.
    RADIOLOGY, 2024, 313 (03)
  • [46] CLAIR: Evaluating Image Captions with Large Language Models
    Chan, David M.
    Petryk, Suzanne
    Gonzalez, Joseph E.
    Darrell, Trevor
    Canny, John
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13638 - 13646
  • [47] Evaluating large language models on medical evidence summarization
    Liyan Tang
    Zhaoyi Sun
    Betina Idnay
    Jordan G. Nestor
    Ali Soroush
    Pierre A. Elias
    Ziyang Xu
    Ying Ding
    Greg Durrett
    Justin F. Rousseau
    Chunhua Weng
    Yifan Peng
    npj Digital Medicine, 6
  • [48] Baby steps in evaluating the capacities of large language models
    Frank, Michael C.
    NATURE REVIEWS PSYCHOLOGY, 2023, 2 (08): : 451 - 452
  • [49] Evaluating the ability of large language models to emulate personality
    Wang, Yilei
    Zhao, Jiabao
    Ones, Deniz S.
    He, Liang
    Xu, Xin
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [50] Evaluating Large Language Models on Controlled Generation Tasks
    Sun, Jiao
    Tian, Yufei
    Zhou, Wangchunshu
    Xu, Nan
    Hu, Qian
    Gupta, Rahul
    Wieting, John
    Peng, Nanyun
    Ma, Xuezhe
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3155 - 3168