Training Google Neural Machine Translation on an Intel CPU Cluster

被引:1
|
作者
Kalamkar, Dhiraj D. [1 ]
Banerjee, Kunal [1 ]
Srinivasan, Sudarshan [1 ]
Sridharan, Srinivas [1 ]
Georganas, Evangelos [2 ]
Smorkalov, Mikhail E. [3 ]
Xu, Cong [3 ]
Heinecke, Alexander [2 ]
机构
[1] Intel Corp, Parallel Comp Lab, Bangalore, Karnataka, India
[2] Intel Corp, Parallel Comp Lab, Santa Clara, CA USA
[3] Intel Corp, Intel Arch Graph & Sw, Nizhnii Novgorod, Russia
关键词
machine translation; recurrent neural networks; TensorFlow; LIBXSMM; Intel architecture;
D O I
10.1109/cluster.2019.8891019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Google's neural machine translation (GNMT) is state-of-the-art recurrent neural network (RNN/LSTM) based language translation application. It is computationally more demanding than well-studied convolutional neural networks (CNNs). Also, in contrast to CNNs, RNNs heavily mix compute and memory bound layers which requires careful tuning on a latency machine to optimally use fast on-die memories for best single processor performance. Additionally, due to massive compute demand, it is essential to distribute the entire workload among several processors and even compute nodes. To the best of our knowledge, this is the first work which attempts to scale this application on an Intel CPU cluster. Our CPU-based GNMT optimization, the first of its kind, achieves this by the following steps: (i) we choose a monolithic long short-term memory (LSTM) cell implementation from LIBXSMM library (specifically tuned for CPUs) and integrate it into TensorFlow, (ii) we modify GNMT code to use fused time step LSTM op for the encoding stage, (iii) we combine Horovod and Intel MLSL scaling libraries for improved performance on multiple nodes, and (iv) we extend the bucketing logic for grouping similar length sentences together to multiple nodes for achieving load balance across multiple ranks. In summary, we demonstrate that due to these changes we are able to outperform Google's stock CPU-based GNMT implementation by similar to 2x on single node and potentially enable more than 25x speedup using 16 node CPU cluster.
引用
收藏
页码:193 / 202
页数:10
相关论文
共 50 条
  • [31] ZeUS: An Unified Training Framework for Constrained Neural Machine Translation
    Yang, Murun
    IEEE ACCESS, 2024, 12 : 124695 - 124704
  • [32] Bridging the Gap between Training and Inference for Neural Machine Translation
    Zhang, Wen
    Feng, Yang
    Meng, Fandong
    You, Di
    Liu, Qun
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4334 - 4343
  • [33] Joint Training for Neural Machine Translation Models with Monolingual Data
    Zhang, Zhirui
    Liu, Shujie
    Li, Mu
    Zhou, Ming
    Chen, Enhong
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 555 - 562
  • [34] On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation
    Liu, Xuebo
    Wang, Longyue
    Wong, Derek F.
    Ding, Liang
    Chao, Lidia S.
    Shi, Shuming
    Tu, Zhaopeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2900 - 2907
  • [35] Scalable Low-Latency Persistent Neural Machine Translation on CPU Server with Multiple FPGAs
    Nurvitadhi, Eriko
    Boutros, Andrew
    Budhkar, Prerna
    Jafari, Ali
    Kwon, Dongup
    Sheffield, David
    Prabhakaran, Abirami
    Gururaj, Karthik
    Appana, Pranavi
    Naik, Mishali
    2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 307 - 310
  • [36] Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios
    Sun, Haipeng
    Wang, Rui
    Chen, Kehai
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3975 - 3981
  • [37] Neural Machine Translation
    Birch, Alexandra
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 377 - 378
  • [38] Neural Machine Translation
    Jooste, Wandri
    Haque, Rejwanul
    Way, Andy
    MACHINE TRANSLATION, 2021, 35 (02) : 289 - 299
  • [39] Neural Machine Translation Advised by Statistical Machine Translation
    Wang, Xing
    Lu, Zhengdong
    Tu, Zhaopeng
    Li, Hang
    Xiong, Deyi
    Zhang, Min
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3330 - 3336
  • [40] DEEP: DEnoising Entity Pre-training for Neural Machine Translation
    Hu, Junjie
    Hayashi, Hiroaki
    Cho, Kyunghyun
    Neubig, Graham
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1753 - 1766