Optimizing deep learning RNN topologies on intel architecture

被引：0

作者：

Banerjee K. ^{[1
]}

Georganas E. ^{[2
]}

Kalamkar D.D. ^{[1
]}

Ziv B. ^{[3
]}

Segal E. ^{[3
]}

Anderson C. ^{[4
]}

Heinecke A. ^{[2
]}

机构：

[1] Intel Corporation, Bangalore

[2] Intel Corporation, Santa Clara

[3] Intel Corporation, Haifa

[4] Intel Corporation, Oregon

来源：

Supercomputing Frontiers and Innovations | 2019年 / 6卷 / 03期

关键词：

Bandwidth-bound kernel; Compute-bound kernel; Gemm; Intel xeon; Lstm;

D O I：

10.14529/jsfi190304

中图分类号：

学科分类号：

摘要：

Recurrent neural network (RNN) models have been found to be well suited for processing temporal data. In this work, we present an optimized implementation of vanilla RNN cell and its two popular variants: LSTM and GRU for Intel Xeon architecture. Typical implementations of these RNN cells employ one or two large matrix multiplication (GEMM) calls and then apply the element-wise operations (sigmoid/tanh) onto the GEMM results. While this approach is easy to implement by exploiting vendor-optimized GEMM library calls, the data reuse relies on how GEMMs are parallelized and is sub-optimal for GEMM sizes stemming from small minibatch. Also, the element-wise operations are exposed as a bandwidth-bound kernel after the GEMM which is typically a compute-bound kernel. To address this discrepancy, we implemented a parallel blocked matrix GEMM in order to (a) achieve load balance, (b) maximize weight matrix reuse, (c) fuse the element-wise operations after partial GEMM blocks are computed and while they are hot in cache. Additionally, we bring the time step loop in our cell to further increase the weight reuse and amortize the overhead to transform the weights into blocked layout. The results show that our implementation is generally faster than Intel MKL-DNN library implementations, e.g. for RNN, forward pass is up to ~3x faster whereas the backward/weight update pass is up to ~5x faster. Furthermore, we investigate high-performance implementations of sigmoid and tanh activation functions that achieve various levels of accuracy. These implementations rely on minimax polynomial approximations, rational polynomials, Taylor expansions and exponential approximation techniques. Our vectorized implementations can be flexibly integrated into deep learning computations with different accuracy requirements without compromising performance; in fact, these are able to outperform vectorized and reduced accuracy vendor-optimized (Intel SVML) libraries by 1.6-2.6 x while speep up over GNU libm is close to two orders of magnitude. All our experiments are conducted on Intel's latest CascadeLake architecture. © The Authors 2019.

引用

页码：64 / 85

页数：21

共 50 条

[21] Characterizing and optimizing Java-based HPC applications on Intel many-core architecture
Yang YU
Tianyang LEI
Haibo CHEN
Binyu ZANG
Science China(Information Sciences), 2017, 60 (12) : 207 - 223
[22] Structural-RNN: Deep Learning on Spatio-Temporal Graphs
Jain, Ashesh
Zamir, Amir R.
Savarese, Silvio
Saxena, Ashutosh
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5308 - 5317
[23] Deep RNN Learning for EEG based Functional Brain State Inference
Patnaik, Suprava
Moharkar, Lalita
Chaudhari, Amogh
2017 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND CONTROL (ICAC3), 2017,
[24] LC-RNN: A Deep Learning Model for Traffic Speed Prediction
Lv, Zhongjian
Xu, Jiajie
Zheng, Kai
Yin, Hongzhi
Zhao, Pengpeng
Zhou, Xiaofang
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3470 - 3476
[25] Deep learning PM2.5 concentrations with bidirectional LSTM RNN
Weitian Tong
Lixin Li
Xiaolu Zhou
Andrew Hamilton
Kai Zhang
Air Quality, Atmosphere & Health, 2019, 12 : 411 - 423
[26] An intelligent Chatbot using deep learning with Bidirectional RNN and attention model
Dhyani, Manyu
Kumar, Rajiv
MATERIALS TODAY-PROCEEDINGS, 2021, 34 : 817 - 824
[27] Deep Learning (CNN, RNN) Applications for Smart Homes: A Systematic Review
Yu, Jiyeon
de Antonio, Angelica
Villalba-Mora, Elena
COMPUTERS, 2022, 11 (02)
[28] Deep learning PM2.5 concentrations with bidirectional LSTM RNN
Tong, Weitian
Li, Lixin
Zhou, Xiaolu
Hamilton, Andrew
Zhang, Kai
AIR QUALITY ATMOSPHERE AND HEALTH, 2019, 12 (04): : 411 - 423
[29] ET-RNN: Applying Deep Learning to Credit Loan Applications
Babaev, Dmitrii
Savchenko, Maxim
Tuzhilin, Alexander
Umerenkov, Dmitrii
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2183 - 2190
[30] LSTM based Deep RNN Architecture for Election Sentiment Analysis from Bengali Newspaper
Saha, Baidya Nath
Senapati, Apurbalal
Mahajan, Anmol
2020 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2020), 2020, : 564 - 569

← 1 2 3 4 5 →