Coordinated DVFS and Precision Control for Deep Neural Networks

被引：18

作者：

Nabavinejad, Seyed Morteza ^{[1
]}

Hafez-Kolahi, Hassan ^{[2
]}

Reda, Sherief ^{[3
]}

机构：

[1] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran

[2] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran

[3] Brown Univ, Sch Engn, Providence, RI 02912 USA

来源：

IEEE COMPUTER ARCHITECTURE LETTERS | 2019年 / 18卷 / 02期

基金：

美国国家科学基金会;

关键词：

Graphics processing units; Power demand; Time factors; Runtime; Time-frequency analysis; Servers; Neural networks; Deep neural network; hardware accelerator; power; accuracy; response time;

D O I：

10.1109/LCA.2019.2942020

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Traditionally, DVFS has been the main mechanism to trade-off performance and power. We observe that Deep Neural Network (DNN) applications offer the possibility to trade-off performance, power, and accuracy using both DVFS and numerical precision levels. Our proposed approach, Power-Inference accuracy Trading (PIT), monitors the servers load, and accordingly adjusts the precision of the DNN model and the DVFS setting of GPU to trade-off the accuracy and power consumption with response time. At high loads and tight request arrivals, PIT leverages INT8-precision instructions of GPU to dynamically change the precision of deployed DNN models and boosts GPU frequency to execute the requests faster at the expense of accuracy reduction and high power consumption. However, when the requests arrival rate is relaxed and there is slack time for requests, PIT deploys high precision version of models to improve the accuracy and reduces GPU frequency to decrease power consumption. We implement and deploy PIT on a state-of-the-art server equipped with a Tesla P40 GPU. Experimental results demonstrate that depending on the load, PIT can improve response time up to 11 percent compared to a job scheduler that uses only FP32 precision. It also improves the energy consumption by up to 28 percent, while achieving around 99.5 percent accuracy of sole FP32-precision.

引用

页码：136 / 140

页数：5

共 50 条

[21] On Local Entropy, Stochastic Control, and Deep Neural Networks
Pavon, Michele
IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 437 - 441
[22] A greenhouse modeling and control using deep neural networks
Salah, Latifa Belhaj
Fourati, Fathi
APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (15) : 1905 - 1929
[23] Multi-objective Precision Optimization of Deep Neural Networks for Edge Devices
Nhut-Minh Ho
Vaddi, Ramesh
Wong, Weng-Fai
2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 1100 - 1105
[24] Impact of Mixed Precision Techniques on Training and Inference Efficiency of Deep Neural Networks
Doerrich, Marion
Fan, Mingcheng
Kist, Andreas M.
IEEE ACCESS, 2023, 11 : 57627 - 57634
[25] POSITNN: TRAINING DEEP NEURAL NETWORKS WITH MIXED LOW-PRECISION POSIT
Raposo, Goncalo
Tomas, Pedro
Roma, Nuno
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7908 - 7912
[26] INA: Incremental Network Approximation Algorithm for Limited Precision Deep Neural Networks
Liu, Zheyu
Jia, Kaige
Liu, Weiqiang
Wei, Qi
Qiao, Fei
Yang, Huazhong
2019 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2019,
[27] DPS: Dynamic Precision Scaling for Stochastic Computing -based Deep Neural Networks
Sim, Hyeonuk
Kenzhegulov, Saken
Lee, Jongeun
2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2018,
[28] Distributed Coordinated Attitude Tracking Control for Spacecraft Formation Based on Neural Networks
Wang, Wenjia
Li, Chuanjiang
Sun, Yanchao
Jiang, Boyan
2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 7190 - 7195
[29] Coordinated Insertion Control for Inclined Precision Assembly
Xing, Dengpeng
Liu, Fangfang
Qin, Fangbo
Xu, De
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2016, 63 (05) : 2990 - 2999
[30] Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks
Faraone, Julian
Fraser, Nicholas
Gambardella, Giulio
Blott, Michaela
Leong, Philip H. W.
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 393 - 404

← 1 2 3 4 5 →