Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

被引：0

作者：

Narayanan, Deepak ^{[1
,3
]}

Santhanam, Keshav ^{[2
]}

Henderson, Peter ^{[2
]}

Bommasani, Rishi ^{[2
]}

Lee, Tony ^{[2
]}

Liang, Percy ^{[2
]}

机构：

[1] NVIDIA, Santa Clara, CA 95051 USA

[2] Stanford Univ, Stanford, CA 94305 USA

[3] Microsoft Res, Redmond, WA 98052 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) are highly capable but also computationally expensive. Characterizing the fundamental tradeoff between inference efficiency and model capabilities is thus important, but requires an efficiency metric that is comparable across models from different providers. Unfortunately, raw runtimes measured through black-box APIs do not satisfy this property: model providers can implement software and hardware optimizations orthogonal to the model, and shared infrastructure introduces performance contention. We propose a new metric for inference efficiency called idealized runtime, that puts models on equal footing as though they were served on uniform hardware and software without performance contention, and a cost model to efficiently estimate this metric for autoregressive Transformer models. We also propose variants of the idealized runtime that incorporate the number and type of accelerators needed to serve the model. Using these metrics, we compare ten LLMs developed in 2022 to provide the first analysis of inference efficiency-capability tradeoffs; we make several observations from this analysis, including the fact that the superior inference runtime performance of certain APIs is often a byproduct of optimizations within the API rather than the underlying model. Our code is open sourced at https://github.com/stanford-crfm/helm-efficiency.

引用

页数：21

共 50 条

[1] Improvements in Estimating Bioaccumulation Metrics in the Light of Toxicokinetic Models and Bayesian Inference
Ratier, Aude
Lopes, Christelle
Charles, Sandrine
[J]. ARCHIVES OF ENVIRONMENTAL CONTAMINATION AND TOXICOLOGY, 2022, 83 (04) : 339 - 348
[2] Improvements in Estimating Bioaccumulation Metrics in the Light of Toxicokinetic Models and Bayesian Inference
Aude Ratier
Christelle Lopes
Sandrine Charles
[J]. Archives of Environmental Contamination and Toxicology, 2022, 83 : 339 - 348
[3] On inference for threshold autoregressive models
Osnat Stramer
Yu-Jau Lin
[J]. Test, 2002, 11 : 55 - 71
[4] On inference for threshold autoregressive models
Stramer, O
Lin, YJ
[J]. TEST, 2002, 11 (01) : 55 - 71
[5] Uniform inference in autoregressive models
Mikusheva, Anna
[J]. ECONOMETRICA, 2007, 75 (05) : 1411 - 1452
[6] GMM inference in spatial autoregressive models
Taspinar, Suleyman
Dogan, Osman
Vijverberg, Wim P. M.
[J]. ECONOMETRIC REVIEWS, 2018, 37 (09) : 931 - 954
[7] Subsampling inference in threshold autoregressive models
Gonzalo, J
Wolf, M
[J]. JOURNAL OF ECONOMETRICS, 2005, 127 (02) : 201 - 224
[8] Estimating a spatial autoregressive model with autoregressive disturbances based on the indirect inference principle
Bao, Yong
Liu, Xiaotian
[J]. SPATIAL ECONOMIC ANALYSIS, 2021, 16 (04) : 506 - 529
[9] Estimating Structured Vector Autoregressive Models
Melnyk, Igor
Banerjee, Arindam
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[10] Estimating the Density of the Residuals in Autoregressive Models
Eckhard Liebscher
[J]. Statistical Inference for Stochastic Processes, 1999, 2 (2) : 105 - 117

← 1 2 3 4 5 →