Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

被引:0
|
作者
Narayanan, Deepak [1 ,3 ]
Santhanam, Keshav [2 ]
Henderson, Peter [2 ]
Bommasani, Rishi [2 ]
Lee, Tony [2 ]
Liang, Percy [2 ]
机构
[1] NVIDIA, Santa Clara, CA 95051 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Microsoft Res, Redmond, WA 98052 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) are highly capable but also computationally expensive. Characterizing the fundamental tradeoff between inference efficiency and model capabilities is thus important, but requires an efficiency metric that is comparable across models from different providers. Unfortunately, raw runtimes measured through black-box APIs do not satisfy this property: model providers can implement software and hardware optimizations orthogonal to the model, and shared infrastructure introduces performance contention. We propose a new metric for inference efficiency called idealized runtime, that puts models on equal footing as though they were served on uniform hardware and software without performance contention, and a cost model to efficiently estimate this metric for autoregressive Transformer models. We also propose variants of the idealized runtime that incorporate the number and type of accelerators needed to serve the model. Using these metrics, we compare ten LLMs developed in 2022 to provide the first analysis of inference efficiency-capability tradeoffs; we make several observations from this analysis, including the fact that the superior inference runtime performance of certain APIs is often a byproduct of optimizations within the API rather than the underlying model. Our code is open sourced at https://github.com/stanford-crfm/helm-efficiency.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Improvements in Estimating Bioaccumulation Metrics in the Light of Toxicokinetic Models and Bayesian Inference
    Ratier, Aude
    Lopes, Christelle
    Charles, Sandrine
    [J]. ARCHIVES OF ENVIRONMENTAL CONTAMINATION AND TOXICOLOGY, 2022, 83 (04) : 339 - 348
  • [2] Improvements in Estimating Bioaccumulation Metrics in the Light of Toxicokinetic Models and Bayesian Inference
    Aude Ratier
    Christelle Lopes
    Sandrine Charles
    [J]. Archives of Environmental Contamination and Toxicology, 2022, 83 : 339 - 348
  • [3] On inference for threshold autoregressive models
    Osnat Stramer
    Yu-Jau Lin
    [J]. Test, 2002, 11 : 55 - 71
  • [4] On inference for threshold autoregressive models
    Stramer, O
    Lin, YJ
    [J]. TEST, 2002, 11 (01) : 55 - 71
  • [5] Uniform inference in autoregressive models
    Mikusheva, Anna
    [J]. ECONOMETRICA, 2007, 75 (05) : 1411 - 1452
  • [6] GMM inference in spatial autoregressive models
    Taspinar, Suleyman
    Dogan, Osman
    Vijverberg, Wim P. M.
    [J]. ECONOMETRIC REVIEWS, 2018, 37 (09) : 931 - 954
  • [7] Subsampling inference in threshold autoregressive models
    Gonzalo, J
    Wolf, M
    [J]. JOURNAL OF ECONOMETRICS, 2005, 127 (02) : 201 - 224
  • [8] Estimating a spatial autoregressive model with autoregressive disturbances based on the indirect inference principle
    Bao, Yong
    Liu, Xiaotian
    [J]. SPATIAL ECONOMIC ANALYSIS, 2021, 16 (04) : 506 - 529
  • [9] Estimating Structured Vector Autoregressive Models
    Melnyk, Igor
    Banerjee, Arindam
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [10] Estimating the Density of the Residuals in Autoregressive Models
    Eckhard Liebscher
    [J]. Statistical Inference for Stochastic Processes, 1999, 2 (2) : 105 - 117