Benchmarking Large Language Models: Opportunities and Challenges

被引:0
|
作者
Hodak, Miro [1 ]
Ellison, David [2 ]
Van Buren, Chris [2 ]
Jiang, Xiaotong [2 ]
Dholakia, Ajay [2 ]
机构
[1] AMD, Data Ctr Solut Grp, Austin, TX 78735 USA
[2] Lenovo, Infrastruct Solut Grp, Morrisville, NC USA
关键词
Artificial Intelligence; Inference; Training; MLPerf; TPCx-AI; Deep Learning; Performance; Large Language Models;
D O I
10.1007/978-3-031-68031-1_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With exponentially growing popularity of Large Language Models (LLMs) and LLM-based applications like ChatGPT and Bard, the Artificial Intelligence (AI) community of developers and users are in need of representative benchmarks to enable careful comparison across a variety of use cases. The set of metrics has grown beyond accuracy and throughput to include energy efficiency, bias, trust and sustainability. This paper aims to provide an overview of popular LLMs from a benchmarking perspective. Key LLMs are described, and the associated datasets are characterized. A detailed discussion of benchmarking metrics covering training and inference stages is provided and challenges in evaluating these metrics are highlighted. A review of recent performance and benchmark submissions is included, and emerging trends are summarized. The paper lays the foundation for developing new benchmarks to allow informed comparison of different AI systems based on combinations of models, datasets, and metrics.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [1] Large language models in psychiatry: Opportunities and challenges
    Volkmer, Sebastian
    Meyer-Lindenberg, Andreas
    Schwarz, Emanuel
    PSYCHIATRY RESEARCH, 2024, 339
  • [2] ChatGPT and large language models in academia: opportunities and challenges
    Jesse G. Meyer
    Ryan J. Urbanowicz
    Patrick C. N. Martin
    Karen O’Connor
    Ruowang Li
    Pei-Chen Peng
    Tiffani J. Bright
    Nicholas Tatonetti
    Kyoung Jae Won
    Graciela Gonzalez-Hernandez
    Jason H. Moore
    BioData Mining, 16
  • [3] ChatGPT and large language models in academia: opportunities and challenges
    Meyer, Jesse G.
    Urbanowicz, Ryan J.
    Martin, Patrick C. N.
    O'Connor, Karen
    Li, Ruowang
    Peng, Pei-Chen
    Bright, Tiffani J.
    Tatonetti, Nicholas
    Won, Kyoung Jae
    Gonzalez-Hernandez, Graciela
    Moore, Jason H.
    BIODATA MINING, 2023, 16 (01)
  • [4] The Social Opportunities and Challenges in the Era of Large Language Models
    Huimin C.
    Zhiyuan L.
    Maosong S.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1094 - 1103
  • [5] Large Language Models: Opportunities and Challenges For Cognitive Assessment
    Efremova, Maria
    Kubiak, Emeric
    Baron, Simon
    Bernard, David
    EUROPEAN JOURNAL OF PSYCHOLOGY OPEN, 2023, 82 : 133 - 134
  • [6] Embracing Large Language Models for Medical Applications: Opportunities and Challenges
    Karabacak, Mert
    Margetis, Konstantinos
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (05)
  • [7] Large language models for building energy applications: Opportunities and challenges
    Liu, Mingzhe
    Zhang, Liang
    Chen, Jianli
    Chen, Wei-An
    Yang, Zhiyao
    Lo, L. James
    Wen, Jin
    O'Neill, Zheng
    BUILDING SIMULATION, 2025, 18 (02) : 225 - 234
  • [8] Large Language Models for Business Process Management: Opportunities and Challenges
    Vidgof, Maxim
    Bachhofner, Stefan
    Mendling, Jan
    BUSINESS PROCESS MANAGEMENT FORUM, BPM 2023 FORUM, 2023, 490 : 107 - 123
  • [9] Challenges and Opportunities of Moderating Usage of Large Language Models in Education
    Krupp, Lars
    Steinert, Steffen
    Kiefer-Emmanouilidis, Maximilian
    Avila, Karina E.
    Lukowicz, Paul
    Kuhn, Jochen
    Kuechemann, Stefan
    Karolus, Jakob
    AI FOR EDUCATION WORKSHOP, 2024, 257 : 9 - 17
  • [10] Large Language Models and Future of Information Retrieval: Opportunities and Challenges
    Zhai, ChengXiang
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 481 - 490