Benchmarking Large Language Models: Opportunities and Challenges

被引:0
|
作者
Hodak, Miro [1 ]
Ellison, David [2 ]
Van Buren, Chris [2 ]
Jiang, Xiaotong [2 ]
Dholakia, Ajay [2 ]
机构
[1] AMD, Data Ctr Solut Grp, Austin, TX 78735 USA
[2] Lenovo, Infrastruct Solut Grp, Morrisville, NC USA
关键词
Artificial Intelligence; Inference; Training; MLPerf; TPCx-AI; Deep Learning; Performance; Large Language Models;
D O I
10.1007/978-3-031-68031-1_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With exponentially growing popularity of Large Language Models (LLMs) and LLM-based applications like ChatGPT and Bard, the Artificial Intelligence (AI) community of developers and users are in need of representative benchmarks to enable careful comparison across a variety of use cases. The set of metrics has grown beyond accuracy and throughput to include energy efficiency, bias, trust and sustainability. This paper aims to provide an overview of popular LLMs from a benchmarking perspective. Key LLMs are described, and the associated datasets are characterized. A detailed discussion of benchmarking metrics covering training and inference stages is provided and challenges in evaluating these metrics are highlighted. A review of recent performance and benchmark submissions is included, and emerging trends are summarized. The paper lays the foundation for developing new benchmarks to allow informed comparison of different AI systems based on combinations of models, datasets, and metrics.
引用
收藏
页码:77 / 89
页数:13
相关论文
共 50 条
  • [41] LAraBench: Benchmarking Arabic AI with Large Language Models
    Qatar Computing Research Institute, HBKU, Qatar
    不详
    arXiv, 1600,
  • [42] BLESS: Benchmarking Large Language Models on Sentence Simplification
    Kew, Tannon
    Chi, Alison
    Vasquez-Rodriguez, Laura
    Agrawal, Sweta
    Aumiller, Dennis
    Alva-Manchego, Fernando
    Shardlow, Matthew
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13291 - 13309
  • [43] Integration of Advanced Large Language Models into the Construction of Adverse Outcome Pathways: Opportunities and Challenges
    Shi, Haochun
    Zhao, Yanbin
    ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2024, 58 (35) : 15355 - 15358
  • [44] Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy
    Bozkir, Efe
    Ozdel, Suleyman
    Lau, Ka Hei Carrie
    Wang, Mengdi
    Gao, Hong
    Kasneci, Enkelejda
    PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,
  • [45] TRAM: Benchmarking Temporal Reasoning for Large Language Models
    Wang, Yuqing
    Zhao, Yun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 6389 - 6415
  • [46] Debiasing large language models: research opportunities
    Yogarajan, Vithya
    Dobbie, Gillian
    Keegan, Te Taka
    JOURNAL OF THE ROYAL SOCIETY OF NEW ZEALAND, 2025, 55 (02) : 372 - 395
  • [47] Large Language Models in Education Embracing opportunities, confronting challenges, and shaping the next chapter together
    Liu, Bingbin B.
    XRDS: Crossroads, 2024, 31 (01): : 7 - 9
  • [48] Benchmarking large language models for biomedical natural language processing applications and recommendations
    Chen, Qingyu
    Hu, Yan
    Peng, Xueqing
    Xie, Qianqian
    Jin, Qiao
    Gilson, Aidan
    Singer, Maxwell B.
    Ai, Xuguang
    Lai, Po-Ting
    Wang, Zhizheng
    Keloth, Vipina K.
    Raja, Kalpana
    Huang, Jimin
    He, Huan
    Lin, Fongci
    Du, Jingcheng
    Zhang, Rui
    Zheng, W. Jim
    Adelman, Ron A.
    Lu, Zhiyong
    Xu, Hua
    NATURE COMMUNICATIONS, 2025, 16 (01)
  • [49] Benchmarking Large Language Models in Retrieval-Augmented Generation
    Chen, Jiawei
    Lin, Hongyu
    Han, Xianpei
    Sun, Le
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17754 - 17762
  • [50] SEED-Bench: Benchmarking Multimodal Large Language Models
    Li, Bohao
    Ge, Yuying
    Ge, Yixiao
    Wang, Guangzhi
    Wang, Rui
    Zhang, Ruimao
    Shi, Ying
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13299 - 13308