Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models

被引：0

作者：

Kim, Chanwoo ^{[1
,3
]}

Variani, Ehsan ^{[2
]}

Narayanan, Arun ^{[2
]}

Bacchiani, Michiel ^{[2
]}

机构：

[1] Samsung Res, Seoul, South Korea

[2] Google Speech, Mountain View, CA USA

[3] Google, Mountain View, CA USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Simulated data; room acoustics; robust speech recognition; deep learning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we describe how to efficiently implement an acoustic room simulator to generate large-scale simulated data for training deep neural networks. Even though Google Room Simulator in [1] was shown to be quite effective in reducing the Word Error Rates (WERs) for far-field applications by generating simulated far-field training sets, it requires a very large number of FFTs. Room Simulator used approximately 80 % of CPU usage in our CPU/GPU training architecture [2]. In this work, we implement an efficient OverLap Addition (OLA) based filtering using the open-source FFTW3 library. Further, we investigate the effects of the Room Impulse Response (RIR) lengths. Experimentally, we conclude that we can cut the tail portions of RIRs whose power is less than 20 dB below the maximum power without sacrificing the speech recognition accuracy. However, we observe that cutting RIR tail more than this threshold harms the speech recognition accuracy for rerecorded test sets. Using these approaches, we were able to reduce CPU usage for the room simulator portion down to 9.69 % in CPU/GPU training architecture. Profiling result shows that we obtain 22.4 times speed-up on a single machine and 37.3 times speed up on Google's distributed training infrastructure.

引用

下载

页码：3028 / 3032

页数：5

共 50 条

[1] STANDALONE TRAINING OF CONTEXT-DEPENDENT DEEP NEURAL NETWORK ACOUSTIC MODELS
Zhang, C.
Woodland, P. C.
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[2] Memory Efficient Deep Neural Network Training
Shilova, Alena
EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS, 2022, 13098 : 515 - 519
[3] The implementation of checklists in engine room simulator training
Kluj, S
MER-MARINE ENGINEERS REVIEW, 2000, : 35 - 36
[4] Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies
Cui, Xiaodong
Zhang, Wei
Finkler, Ulrich
Saon, George
Picheny, Michael
Kung, David
IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (03) : 39 - 49
[5] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
Miao, Yajie
Zhang, Hao
Metze, Florian
IEEE Transactions on Audio, Speech and Language Processing, 2015, 23 (11): : 1938 - 1949
[6] INVESTIGATION OF MIXTURE SPLITTING CONCEPT FOR TRAINING LINEAR BOTTLENECKS OF DEEP NEURAL NETWORK ACOUSTIC MODELS
Tahir, Muhammad Ali
Wiesler, Simon
Schlueter, Ralf
Ney, Hermann
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4614 - 4618
[7] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
Miao, Yajie
Zhang, Hao
Metze, Florian
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1938 - 1949
[8] Semi-supervised Maximum Mutual Information Training of Deep Neural Network Acoustic Models
Manohar, Vimal
Povey, Daniel
Khudanpur, Sanjeev
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2630 - 2634
[9] Cluster-Based Senone Selection for the Efficient Calculation of Deep Neural Network Acoustic Models
Liu, Jun-Hua
Ling, Zhen-Hua
Wei, Si
Hu, Guo-Ping
Dai, Li-Rong
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[10] Analysis of Deep Neural Network Models for Acoustic Scene Classification
Basbug, Ahmet Melih
Sert, Mustafa
2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,

← 1 2 3 4 5 →