Training Google Neural Machine Translation on an Intel CPU Cluster

被引:1
|
作者
Kalamkar, Dhiraj D. [1 ]
Banerjee, Kunal [1 ]
Srinivasan, Sudarshan [1 ]
Sridharan, Srinivas [1 ]
Georganas, Evangelos [2 ]
Smorkalov, Mikhail E. [3 ]
Xu, Cong [3 ]
Heinecke, Alexander [2 ]
机构
[1] Intel Corp, Parallel Comp Lab, Bangalore, Karnataka, India
[2] Intel Corp, Parallel Comp Lab, Santa Clara, CA USA
[3] Intel Corp, Intel Arch Graph & Sw, Nizhnii Novgorod, Russia
关键词
machine translation; recurrent neural networks; TensorFlow; LIBXSMM; Intel architecture;
D O I
10.1109/cluster.2019.8891019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Google's neural machine translation (GNMT) is state-of-the-art recurrent neural network (RNN/LSTM) based language translation application. It is computationally more demanding than well-studied convolutional neural networks (CNNs). Also, in contrast to CNNs, RNNs heavily mix compute and memory bound layers which requires careful tuning on a latency machine to optimally use fast on-die memories for best single processor performance. Additionally, due to massive compute demand, it is essential to distribute the entire workload among several processors and even compute nodes. To the best of our knowledge, this is the first work which attempts to scale this application on an Intel CPU cluster. Our CPU-based GNMT optimization, the first of its kind, achieves this by the following steps: (i) we choose a monolithic long short-term memory (LSTM) cell implementation from LIBXSMM library (specifically tuned for CPUs) and integrate it into TensorFlow, (ii) we modify GNMT code to use fused time step LSTM op for the encoding stage, (iii) we combine Horovod and Intel MLSL scaling libraries for improved performance on multiple nodes, and (iv) we extend the bucketing logic for grouping similar length sentences together to multiple nodes for achieving load balance across multiple ranks. In summary, we demonstrate that due to these changes we are able to outperform Google's stock CPU-based GNMT implementation by similar to 2x on single node and potentially enable more than 25x speedup using 16 node CPU cluster.
引用
收藏
页码:193 / 202
页数:10
相关论文
共 50 条
  • [41] Neural Machine Translation as a Novel Approach to Machine Translation
    Benkova, Lucia
    Benko, Lubomir
    DIVAI 2020: 13TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2020, : 499 - 508
  • [42] Training and Inference Methods for High-Coverage Neural Machine Translation
    Yang, Michael
    Liu, Yixin
    Mayuranath, Rahul
    NEURAL GENERATION AND TRANSLATION, 2020, : 119 - 128
  • [43] Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation
    Jiao, Wenxiang
    Wang, Xing
    He, Shilin
    King, Irwin
    Lyu, Michael R.
    Tu, Zhaopeng
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2255 - 2266
  • [44] Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation
    Xu, Yangyifan
    Liu, Yijin
    Meng, Fandong
    Zhang, Jiajun
    Xu, Jinan
    Zhou, Jie
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 511 - 516
  • [45] Neural Name Translation Improves Neural Machine Translation
    Li, Xiaoqing
    Yan, Jinghui
    Zhang, Jiajun
    Zong, Chengqing
    MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 93 - 100
  • [46] The Event/Machine of Neural Machine Translation?
    Regnauld, Arnaud
    JOURNAL OF AESTHETICS AND PHENOMENOLOGY, 2022, 9 (02) : 141 - 154
  • [47] IS GOOGLE TRANSLATION THE BEST TRANSLATION MACHINE FOR YOUR MOOC TRANSLATION? WE KNOW THE ANSWER
    Kerr, R.
    EDULEARN16: 8TH INTERNATIONAL CONFERENCE ON EDUCATION AND NEW LEARNING TECHNOLOGIES, 2016, : 5842 - 5842
  • [48] Noise-Based Adversarial Training for Enhancing Agglutinative Neural Machine Translation
    Ji, Yatu
    Hou, Hongxu
    Chen, Junjie
    Wu, Nier
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2019, 11670 : 392 - 396
  • [49] CSP: Code-Switching Pre-training for Neural Machine Translation
    Yang, Zhen
    Hu, Bojie
    Han, Ambyera
    Huang, Shen
    Ju, Qi
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2624 - 2636
  • [50] Sequence-Level Training for Non-Autoregressive Neural Machine Translation
    Shao, Chenze
    Feng, Yang
    Zhang, Jinchao
    Meng, Fandong
    Zhou, Jie
    COMPUTATIONAL LINGUISTICS, 2021, 47 (04) : 891 - 925