Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models

被引：0

作者：

Kim, Chanwoo ^{[1
,3
]}

Variani, Ehsan ^{[2
]}

Narayanan, Arun ^{[2
]}

Bacchiani, Michiel ^{[2
]}

机构：

[1] Samsung Res, Seoul, South Korea

[2] Google Speech, Mountain View, CA USA

[3] Google, Mountain View, CA USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Simulated data; room acoustics; robust speech recognition; deep learning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we describe how to efficiently implement an acoustic room simulator to generate large-scale simulated data for training deep neural networks. Even though Google Room Simulator in [1] was shown to be quite effective in reducing the Word Error Rates (WERs) for far-field applications by generating simulated far-field training sets, it requires a very large number of FFTs. Room Simulator used approximately 80 % of CPU usage in our CPU/GPU training architecture [2]. In this work, we implement an efficient OverLap Addition (OLA) based filtering using the open-source FFTW3 library. Further, we investigate the effects of the Room Impulse Response (RIR) lengths. Experimentally, we conclude that we can cut the tail portions of RIRs whose power is less than 20 dB below the maximum power without sacrificing the speech recognition accuracy. However, we observe that cutting RIR tail more than this threshold harms the speech recognition accuracy for rerecorded test sets. Using these approaches, we were able to reduce CPU usage for the room simulator portion down to 9.69 % in CPU/GPU training architecture. Profiling result shows that we obtain 22.4 times speed-up on a single machine and 37.3 times speed up on Google's distributed training infrastructure.

引用

页码：3028 / 3032

页数：5

共 50 条

[31] A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training
Li, Dongsheng
Li, Shengwei
Lai, Zhiquan
Fu, Yongquan
Ye, Xiangyu
Cai, Lei
Qiao, Linbo
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (04) : 577 - 591
[32] Communication-Efficient Parallelization Strategy for Deep Convolutional Neural Network Training
Lee, Sunwoo
Agrawal, Ankit
Balaprakash, Prasanna
Choudhary, Alok
Liao, Wei-keng
PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 47 - 56
[33] Efficient Dynamic Device Placement for Deep Neural Network Training on Heterogeneous Systems
Huang, Zi Xuan
Fu, Shen Yu
Hsu, Wei Chung
EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2019, 2019, 11733 : 51 - 64
[34] Computational Storage for an Energy-Efficient Deep Neural Network Training System
Li, Shiju
Tang, Kevin
Lim, Jin
Lee, Chul-Ho
Kim, Jongryool
EURO-PAR 2023: PARALLEL PROCESSING, 2023, 14100 : 304 - 319
[35] Efficient deep neural network training via decreasing precision with layer capacity
Ao Shen
Zhiquan Lai
Tao Sun
Shengwei Li
Keshi Ge
Weijie Liu
Dongsheng Li
Frontiers of Computer Science, 2025, 19 (10)
[36] Efficient implementation of the THSOM neural network
Marek, Rudolf
Skrbek, Miroslav
ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT II, 2008, 5164 : 159 - 168
[37] Efficient Implementation of Neural Network Deinterlacing
Seo, Guiwon
Choi, Hyunsoo
Lee, Chulhee
IMAGE PROCESSING: ALGORITHMS AND SYSTEMS VII, 2009, 7245
[38] Visualization in Deep Neural Network Training
Kollias, Stefanos
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2022, 31 (03)
[39] Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization
Kingsbury, Brian
Sainath, Tara N.
Soltau, Hagen
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 10 - 13
[40] Hybrid Neural Network for Efficient Training
Hossain, Md. Billal
Islam, Sayeed
Zhumur, Noor-e-Hafsa
Khanam, Najmoon Nahar
Khan, Md. Imran
Kabir, Md. Ahasan
2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION ENGINEERING (ECCE), 2017, : 528 - 532

← 1 2 3 4 5 →