NEEBS: Nonexpert large-scale environment building system for deep neural network

被引：0

作者：

Tajima, Yoshiharu ^{[1
,2
]}

Asaoka, Masahiro ^{[1
]}

Tabuchi, Akihiro ^{[1
]}

Kasagi, Akihiko ^{[1
]}

Tabaru, Tsuguchika ^{[1
]}

机构：

[1] Fujitsu Ltd, Fujitsu Labs, Kawasaki, Kanagawa, Japan

[2] Fujitsu Ltd, Fujitsu Labs, 4-1-1 Kamikodanaka,Nakahara ku, Kawasaki, Kanagawa, Japan

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2023年 / 35卷 / 19期

关键词：

BERT; deep neural network; large-scale clusters; natural language processing;

D O I：

10.1002/cpe.7499

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Deep neural networks (DNNs) have greatly improved the accuracy of various tasks in areas such as natural language processing (NLP). Obtaining a highly accurate DNN model requires multiple repetitions of training on a huge dataset, which requires a large-scale cluster the compute nodes of which are tightly connected by high-speed interconnects to exchange a large amount of intermediate data with very short latency. However, fully using the computational power of a large-scale cluster for training requires knowledge of its components such as a distributed file system, an interconnection, and optimized high-performance libraries. We have developed a Non-Expert large-scale Environment Building System (NEEBS) that aids a user in building a fast-running training environment on a large-scale cluster. It automatically installs and configures the applications and necessary libraries. It also optimally prepares tools to stage both data and executable programs, and launcher scripts suitable for both the applications and job submission systems of the cluster. NEEBS achieves 93.91% throughput scalability in NLP pretraining. We also present an approach to reduce pretraining time of highly accurate DNN model for NLP using a large-scale computation environment built using NEEBS. We trained a Bidirectional Encoder Representations from Transformers (BERT)-3.9b and a BERT-xlarge using a dense masked language model (MLM) on Megatron-LM framework and evaluated the improvement in learning time and learning efficiency for a Japanese language dataset using 768 graphics processing units (GPUs) on the AI Bridging Cloud Infrastructure (ABCI). Our implementation NEEBS improved learning efficiency per iteration by a factor of 10 and completed the pretraining of BERT-xlarge in 4.7 h. This pretraining takes 5 months on a single GPU. To determine if the BERT models are correctly pretrained, we evaluated their accuracy in two tasks, Stanford Natural Language Inference Corpus translated into Japanese (JSNLI) and Twitter reputation analysis (TwitterRA). BERT-3.9b achieved 94.30% accuracy for JSNLI, and BERT-xlarge achieved 90.63% accuracy for TwitterRA. We constructed pretrained models with comparable accuracy to other Japanese BERT models in a shorter time.

引用

页数：15

共 50 条

[41] Virtual locomotion system for large-scale virtual environment
Bouguila, L
Sato, M
[J]. IEEE VIRTUAL REALITY 2002, PROCEEDINGS, 2002, : 291 - 292
[42] Applying Deep Neural Network (DNN) for Large-Scale Indoor Localization using Feed-Forward Neural Network (FFNN) Algorithm
Adege, Abebe Belay
Yen, Lei
Lin, Hsin-piao
Yayeh, Yirga
Li, Yun Ruei
Jeng, Shiann-Shiun
Berie, Getaneh
[J]. PROCEEDINGS OF 4TH IEEE INTERNATIONAL CONFERENCE ON APPLIED SYSTEM INNOVATION 2018 ( IEEE ICASI 2018 ), 2018, : 814 - 817
[43] Online topological map building and qualitative localization in large-scale environment
Shia, Chaoxia
Wang, Yanqing
Yang, Jingyu
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2010, 58 (05) : 488 - 496
[44] A nested ring-based architecture for building a very large-scale network switching system
Lee, LT
Tao, DF
Chang, CC
Shih, LC
Lin, HW
[J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 859 - 862
[45] Handling Large-Scale Action Space in Deep Q Network
Zhao, Zhiheng
Liang, Yi
Jin, Xiaoming
[J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 93 - 96
[46] A Deep Residual Network for Large-Scale Acoustic Scene Analysis
Ford, Logan
Tang, Hao
Grondin, Francois
Glass, James
[J]. INTERSPEECH 2019, 2019, : 2568 - 2572
[47] Building population models for large-scale neural recordings: Opportunities and pitfalls
Hurwitz, Cole
Kudryashova, Nina
Onken, Arno
Hennig, Matthias H.
[J]. CURRENT OPINION IN NEUROBIOLOGY, 2021, 70 : 64 - 73
[48] Approximate Deep Network Embedding for Mining Large-scale Graphs
Zhou, Yang
Liu, Ling
[J]. 2019 IEEE FIRST INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2019), 2019, : 53 - 60
[49] Reinforcement learning in a large-scale photonic recurrent neural network
Bueno, J.
Maktoobi, S.
Froehly, L.
Fischer, I.
Jacquot, M.
Larger, L.
Brunner, D.
[J]. OPTICA, 2018, 5 (06): : 756 - 760
[50] Dynamic programming neural network for large-scale optimization problems
Hou, Zengguang
Wu, Cangpu
[J]. Zidonghua Xuebao/Acta Automatica Sinica, 1999, 25 (01): : 45 - 51

← 1 2 3 4 5 →