Training Google Neural Machine Translation on an Intel CPU Cluster

被引：1

作者：

Kalamkar, Dhiraj D. ^{[1
]}

Banerjee, Kunal ^{[1
]}

Srinivasan, Sudarshan ^{[1
]}

Sridharan, Srinivas ^{[1
]}

Georganas, Evangelos ^{[2
]}

Smorkalov, Mikhail E. ^{[3
]}

Xu, Cong ^{[3
]}

Heinecke, Alexander ^{[2
]}

机构：

[1] Intel Corp, Parallel Comp Lab, Bangalore, Karnataka, India

[2] Intel Corp, Parallel Comp Lab, Santa Clara, CA USA

[3] Intel Corp, Intel Arch Graph & Sw, Nizhnii Novgorod, Russia

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER) | 2019年

关键词：

machine translation; recurrent neural networks; TensorFlow; LIBXSMM; Intel architecture;

D O I：

10.1109/cluster.2019.8891019

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Google's neural machine translation (GNMT) is state-of-the-art recurrent neural network (RNN/LSTM) based language translation application. It is computationally more demanding than well-studied convolutional neural networks (CNNs). Also, in contrast to CNNs, RNNs heavily mix compute and memory bound layers which requires careful tuning on a latency machine to optimally use fast on-die memories for best single processor performance. Additionally, due to massive compute demand, it is essential to distribute the entire workload among several processors and even compute nodes. To the best of our knowledge, this is the first work which attempts to scale this application on an Intel CPU cluster. Our CPU-based GNMT optimization, the first of its kind, achieves this by the following steps: (i) we choose a monolithic long short-term memory (LSTM) cell implementation from LIBXSMM library (specifically tuned for CPUs) and integrate it into TensorFlow, (ii) we modify GNMT code to use fused time step LSTM op for the encoding stage, (iii) we combine Horovod and Intel MLSL scaling libraries for improved performance on multiple nodes, and (iv) we extend the bucketing logic for grouping similar length sentences together to multiple nodes for achieving load balance across multiple ranks. In summary, we demonstrate that due to these changes we are able to outperform Google's stock CPU-based GNMT implementation by similar to 2x on single node and potentially enable more than 25x speedup using 16 node CPU cluster.

引用

页码：193 / 202

页数：10

共 50 条

[31] ZeUS: An Unified Training Framework for Constrained Neural Machine Translation
Yang, Murun
IEEE ACCESS, 2024, 12 : 124695 - 124704
[32] Bridging the Gap between Training and Inference for Neural Machine Translation
Zhang, Wen
Feng, Yang
Meng, Fandong
You, Di
Liu, Qun
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4334 - 4343
[33] Joint Training for Neural Machine Translation Models with Monolingual Data
Zhang, Zhirui
Liu, Shujie
Li, Mu
Zhou, Ming
Chen, Enhong
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 555 - 562
[34] On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation
Liu, Xuebo
Wang, Longyue
Wong, Derek F.
Ding, Liang
Chao, Lidia S.
Shi, Shuming
Tu, Zhaopeng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2900 - 2907
[35] Scalable Low-Latency Persistent Neural Machine Translation on CPU Server with Multiple FPGAs
Nurvitadhi, Eriko
Boutros, Andrew
Budhkar, Prerna
Jafari, Ali
Kwon, Dongup
Sheffield, David
Prabhakaran, Abirami
Gururaj, Karthik
Appana, Pranavi
Naik, Mishali
2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 307 - 310
[36] Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios
Sun, Haipeng
Wang, Rui
Chen, Kehai
Utiyama, Masao
Sumita, Eiichiro
Zhao, Tiejun
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3975 - 3981
[37] Neural Machine Translation
Birch, Alexandra
NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 377 - 378
[38] Neural Machine Translation
Jooste, Wandri
Haque, Rejwanul
Way, Andy
MACHINE TRANSLATION, 2021, 35 (02) : 289 - 299
[39] Neural Machine Translation Advised by Statistical Machine Translation
Wang, Xing
Lu, Zhengdong
Tu, Zhaopeng
Li, Hang
Xiong, Deyi
Zhang, Min
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3330 - 3336
[40] DEEP: DEnoising Entity Pre-training for Neural Machine Translation
Hu, Junjie
Hayashi, Hiroaki
Cho, Kyunghyun
Neubig, Graham
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1753 - 1766

← 1 2 3 4 5 →