Training Google Neural Machine Translation on an Intel CPU Cluster

被引:1
|
作者
Kalamkar, Dhiraj D. [1 ]
Banerjee, Kunal [1 ]
Srinivasan, Sudarshan [1 ]
Sridharan, Srinivas [1 ]
Georganas, Evangelos [2 ]
Smorkalov, Mikhail E. [3 ]
Xu, Cong [3 ]
Heinecke, Alexander [2 ]
机构
[1] Intel Corp, Parallel Comp Lab, Bangalore, Karnataka, India
[2] Intel Corp, Parallel Comp Lab, Santa Clara, CA USA
[3] Intel Corp, Intel Arch Graph & Sw, Nizhnii Novgorod, Russia
关键词
machine translation; recurrent neural networks; TensorFlow; LIBXSMM; Intel architecture;
D O I
10.1109/cluster.2019.8891019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Google's neural machine translation (GNMT) is state-of-the-art recurrent neural network (RNN/LSTM) based language translation application. It is computationally more demanding than well-studied convolutional neural networks (CNNs). Also, in contrast to CNNs, RNNs heavily mix compute and memory bound layers which requires careful tuning on a latency machine to optimally use fast on-die memories for best single processor performance. Additionally, due to massive compute demand, it is essential to distribute the entire workload among several processors and even compute nodes. To the best of our knowledge, this is the first work which attempts to scale this application on an Intel CPU cluster. Our CPU-based GNMT optimization, the first of its kind, achieves this by the following steps: (i) we choose a monolithic long short-term memory (LSTM) cell implementation from LIBXSMM library (specifically tuned for CPUs) and integrate it into TensorFlow, (ii) we modify GNMT code to use fused time step LSTM op for the encoding stage, (iii) we combine Horovod and Intel MLSL scaling libraries for improved performance on multiple nodes, and (iv) we extend the bucketing logic for grouping similar length sentences together to multiple nodes for achieving load balance across multiple ranks. In summary, we demonstrate that due to these changes we are able to outperform Google's stock CPU-based GNMT implementation by similar to 2x on single node and potentially enable more than 25x speedup using 16 node CPU cluster.
引用
收藏
页码:193 / 202
页数:10
相关论文
共 50 条
  • [1] Improvements of Google Neural Machine Translation
    李瑞
    蒋美佳
    海外英语, 2017, (15) : 132 - 134
  • [2] Improving Neural Machine Translation by Bidirectional Training
    Ding, Liang
    Wu, Di
    Tao, Dacheng
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3278 - 3284
  • [3] Discriminant training of neural networks for machine translation
    Quoc-Khanh Do
    Allauzen, Alexandre
    Yvon, Francois
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2016, 57 (01): : 111 - 135
  • [4] Generative adversarial training for neural machine translation
    Yang, Zhen
    Chen, Wei
    Wang, Feng
    Xu, Bo
    NEUROCOMPUTING, 2018, 321 : 146 - 155
  • [5] Speed Up the Training of Neural Machine Translation
    Liu, Xinyue
    Wang, Weixuan
    Liang, Wenxin
    Li, Yuangang
    NEURAL PROCESSING LETTERS, 2020, 51 (01) : 231 - 249
  • [6] Speed Up the Training of Neural Machine Translation
    Xinyue Liu
    Weixuan Wang
    Wenxin Liang
    Yuangang Li
    Neural Processing Letters, 2020, 51 : 231 - 249
  • [7] Minimum Risk Training for Neural Machine Translation
    Shen, Shiqi
    Cheng, Yong
    He, Zhongjun
    He, Wei
    Wu, Hua
    Sun, Maosong
    Liu, Yang
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1683 - 1692
  • [8] Training Neural Machine Translation To Apply Terminology Constraints
    Dinu, Georgiana
    Mathur, Prashant
    Federico, Marcello
    Al-Onaizan, Yaser
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3063 - 3068
  • [9] Shallow-to-Deep Training for Neural Machine Translation
    Li, Bei
    Wang, Ziyang
    Liu, Hui
    Jiang, Yufan
    Du, Quan
    Xiao, Tong
    Wang, Huizhen
    Zhu, Jingbo
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 995 - 1005
  • [10] Pre-training Methods for Neural Machine Translation
    Wang, Mingxuan
    Li, Lei
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: TUTORIAL ABSTRACTS, 2021, : 21 - 25