Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators

被引:0
|
作者
Emani, Murali [1 ]
Foreman, Sam [1 ]
Sastry, Varuni [1 ]
Xie, Zhen [2 ]
Raskar, Siddhisanket [1 ]
Arnold, William [1 ]
Thakur, Rajeev [1 ]
Vishwanath, Venkatram [1 ]
Papka, Michael E. [1 ,3 ]
Shanmugavelu, Sanjif [4 ]
Gandhi, Darshan [5 ]
Zhao, Hengyu [5 ]
Ma, Dun [5 ]
Ranganath, Kiran [5 ]
Weisner, Rick [5 ]
Chen, Jiunn-yeu [6 ]
Yang, Yuting [6 ]
Vassilieva, Natalia [8 ]
Zhang, Bin C. [8 ]
Howland, Sylvia [8 ]
Tsyplikhin, Alexander [7 ]
机构
[1] Argonne Natl Lab, Lemont, IL 60439 USA
[2] SUNY Binghamton, Binghamton, NY 13092 USA
[3] Univ Illinois, Chicago, IL 60637 USA
[4] Grog Inc, Mountain View, CA 94041 USA
[5] SambaNova Syst Inc, Palo Alto, CA 94303 USA
[6] Intel Habana, Santa Clara, CA 95054 USA
[7] Graphcore Inc, Palo Alto, CA 94301 USA
[8] Cerebras Syst, Sunnyvale, CA 95085 USA
关键词
D O I
10.1109/IPDPSW63119.2024.00016
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered a promising approach to address some challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications are contingent upon their efficient execution on the underlying hardware infrastructure. Specialized Al accelerator hardware systems have recently become available for accelerating Al applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPUs and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT-2 model, and (iii) an 1,I,M-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, and sensitivity to gradient accumulation steps.
引用
收藏
页码:48 / 57
页数:10
相关论文
共 50 条
  • [1] Fusing AI: Multimodal Language Models Inference Across Diverse Inputs
    Jovanovic, Mladan
    Campbell, Mark
    COMPUTER, 2024, 57 (11) : 124 - 130
  • [2] Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates
    Sakai, Yusuke
    Nohejl, Adam
    Hang, Jiangnan
    Kamigaito, Hidetaka
    Watanabe, Taro
    arXiv,
  • [3] WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models
    Tul, Shangqing
    Sun, Yuliang
    Bail, Yushi
    Yu, Jifan
    Hou, Lei
    Li, Juanzi
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1517 - 1542
  • [4] Holistic Evaluation of Language Models
    Bommasani, Rishi
    Liang, Percy
    Lee, Tony
    ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, 2023, 1525 (01) : 140 - 146
  • [5] WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators
    Chitty-Venkata, Krishna Teja
    Sastry, Varuni Katti
    Emani, Murali
    Vishwanath, Venkatram
    Shanmugavelu, Sanjif
    Howland, Sylvia
    EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024, 2024, 14802 : 317 - 331
  • [6] A Survey on Hardware Accelerators for Large Language Models
    Kachris, Christoforos
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [7] Large language models generate functional protein sequences across diverse families
    Ali Madani
    Ben Krause
    Eric R. Greene
    Subu Subramanian
    Benjamin P. Mohr
    James M. Holton
    Jose Luis Olmos
    Caiming Xiong
    Zachary Z. Sun
    Richard Socher
    James S. Fraser
    Nikhil Naik
    Nature Biotechnology, 2023, 41 : 1099 - 1106
  • [8] Large language models generate functional protein sequences across diverse families
    Madani, Ali
    Ben Krause, Ben
    Greene, Eric R.
    Subramanian, Subu
    Mohr, Benjamin P.
    Holton, James M.
    Olmos, Jose Luis
    Xiong, Caiming
    Sun, Zachary Z. Z.
    Socher, Richard
    Fraser, James S.
    Naik, Nikhil
    NATURE BIOTECHNOLOGY, 2023, 41 (08) : 1099 - +
  • [9] A critical review of large language models: Sensitivity, bias, and the path toward specialized AI
    Hajikhani, Arash
    Cole, Carolyn
    QUANTITATIVE SCIENCE STUDIES, 2024, 5 (03): : 736 - 756
  • [10] Foundation Models, Generative AI, and Large Language Models
    Ross, Angela
    McGrow, Kathleen
    Zhi, Degui
    Rasmy, Laila
    CIN-COMPUTERS INFORMATICS NURSING, 2024, 42 (05) : 377 - 387