NEEBS: Nonexpert large-scale environment building system for deep neural network

被引:0
|
作者
Tajima, Yoshiharu [1 ,2 ]
Asaoka, Masahiro [1 ]
Tabuchi, Akihiro [1 ]
Kasagi, Akihiko [1 ]
Tabaru, Tsuguchika [1 ]
机构
[1] Fujitsu Ltd, Fujitsu Labs, Kawasaki, Kanagawa, Japan
[2] Fujitsu Ltd, Fujitsu Labs, 4-1-1 Kamikodanaka,Nakahara ku, Kawasaki, Kanagawa, Japan
来源
关键词
BERT; deep neural network; large-scale clusters; natural language processing;
D O I
10.1002/cpe.7499
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep neural networks (DNNs) have greatly improved the accuracy of various tasks in areas such as natural language processing (NLP). Obtaining a highly accurate DNN model requires multiple repetitions of training on a huge dataset, which requires a large-scale cluster the compute nodes of which are tightly connected by high-speed interconnects to exchange a large amount of intermediate data with very short latency. However, fully using the computational power of a large-scale cluster for training requires knowledge of its components such as a distributed file system, an interconnection, and optimized high-performance libraries. We have developed a Non-Expert large-scale Environment Building System (NEEBS) that aids a user in building a fast-running training environment on a large-scale cluster. It automatically installs and configures the applications and necessary libraries. It also optimally prepares tools to stage both data and executable programs, and launcher scripts suitable for both the applications and job submission systems of the cluster. NEEBS achieves 93.91% throughput scalability in NLP pretraining. We also present an approach to reduce pretraining time of highly accurate DNN model for NLP using a large-scale computation environment built using NEEBS. We trained a Bidirectional Encoder Representations from Transformers (BERT)-3.9b and a BERT-xlarge using a dense masked language model (MLM) on Megatron-LM framework and evaluated the improvement in learning time and learning efficiency for a Japanese language dataset using 768 graphics processing units (GPUs) on the AI Bridging Cloud Infrastructure (ABCI). Our implementation NEEBS improved learning efficiency per iteration by a factor of 10 and completed the pretraining of BERT-xlarge in 4.7 h. This pretraining takes 5 months on a single GPU. To determine if the BERT models are correctly pretrained, we evaluated their accuracy in two tasks, Stanford Natural Language Inference Corpus translated into Japanese (JSNLI) and Twitter reputation analysis (TwitterRA). BERT-3.9b achieved 94.30% accuracy for JSNLI, and BERT-xlarge achieved 90.63% accuracy for TwitterRA. We constructed pretrained models with comparable accuracy to other Japanese BERT models in a shorter time.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Virtual locomotion system for large-scale virtual environment
    Bouguila, L
    Sato, M
    [J]. IEEE VIRTUAL REALITY 2002, PROCEEDINGS, 2002, : 291 - 292
  • [42] Applying Deep Neural Network (DNN) for Large-Scale Indoor Localization using Feed-Forward Neural Network (FFNN) Algorithm
    Adege, Abebe Belay
    Yen, Lei
    Lin, Hsin-piao
    Yayeh, Yirga
    Li, Yun Ruei
    Jeng, Shiann-Shiun
    Berie, Getaneh
    [J]. PROCEEDINGS OF 4TH IEEE INTERNATIONAL CONFERENCE ON APPLIED SYSTEM INNOVATION 2018 ( IEEE ICASI 2018 ), 2018, : 814 - 817
  • [43] Online topological map building and qualitative localization in large-scale environment
    Shia, Chaoxia
    Wang, Yanqing
    Yang, Jingyu
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2010, 58 (05) : 488 - 496
  • [44] A nested ring-based architecture for building a very large-scale network switching system
    Lee, LT
    Tao, DF
    Chang, CC
    Shih, LC
    Lin, HW
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 859 - 862
  • [45] Handling Large-Scale Action Space in Deep Q Network
    Zhao, Zhiheng
    Liang, Yi
    Jin, Xiaoming
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 93 - 96
  • [46] A Deep Residual Network for Large-Scale Acoustic Scene Analysis
    Ford, Logan
    Tang, Hao
    Grondin, Francois
    Glass, James
    [J]. INTERSPEECH 2019, 2019, : 2568 - 2572
  • [47] Building population models for large-scale neural recordings: Opportunities and pitfalls
    Hurwitz, Cole
    Kudryashova, Nina
    Onken, Arno
    Hennig, Matthias H.
    [J]. CURRENT OPINION IN NEUROBIOLOGY, 2021, 70 : 64 - 73
  • [48] Approximate Deep Network Embedding for Mining Large-scale Graphs
    Zhou, Yang
    Liu, Ling
    [J]. 2019 IEEE FIRST INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2019), 2019, : 53 - 60
  • [49] Reinforcement learning in a large-scale photonic recurrent neural network
    Bueno, J.
    Maktoobi, S.
    Froehly, L.
    Fischer, I.
    Jacquot, M.
    Larger, L.
    Brunner, D.
    [J]. OPTICA, 2018, 5 (06): : 756 - 760
  • [50] Dynamic programming neural network for large-scale optimization problems
    Hou, Zengguang
    Wu, Cangpu
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 1999, 25 (01): : 45 - 51