Reverberation robust acoustic modeling using i-vectors with time delay neural networks

被引:0
|
作者
Peddinti, Vijayaditya [1 ]
Chen, Guoguo [1 ]
Povey, Daniel [1 ,2 ]
Khudanpur, Sanjeev [1 ,2 ]
机构
[1] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
[2] Johns Hopkins Univ, Human Language Technol Ctr Excellence, Baltimore, MD 21218 USA
基金
美国国家科学基金会;
关键词
far field speech recognition; time delay neural networks; reverberation; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In reverberant environments there are long term interactions between speech and corrupting sources. In this paper a time delay neural network (TDNN) architecture, capable of learning long term temporal relationships and translation invariant representations, is used for reverberation robust acoustic modeling. Further, iVectors are used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 10% relative improvement in word error rate. By sub sampling the outputs at TDNN layers across time steps, training time is reduced. Using a parallel training algorithm we show that the TDNN can be trained on similar to 5500 hours of speech data in 3 days using up to 32 GPUs. The TDNN is shown to provide results competitive with state of the art systems in the IARPA ASpIRE challenge, with 27.7% WER on the dev_test set.
引用
收藏
页码:2440 / 2444
页数:5
相关论文
共 50 条
  • [1] I-Vectors and Structured Neural Networks for Rapid Adaptation of Acoustic Models
    Karanasou, Penny
    Wu, Chunyang
    Gales, Mark
    Woodland, Philip C.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 818 - 828
  • [2] Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
    Saon, George
    Soltau, Hagen
    Nahamoo, David
    Picheny, Michael
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 55 - 59
  • [3] VOICE VERIFICATION USING I-VECTORS AND NEURAL NETWORKS WITH LIMITED TRAINING DATA
    Mamyrbayev, O. Zh.
    Othman, M.
    Akhmediyarova, A. T.
    Kydyrbekova, A. S.
    Mekebayev, N. O.
    [J]. BULLETIN OF THE NATIONAL ACADEMY OF SCIENCES OF THE REPUBLIC OF KAZAKHSTAN, 2019, (03): : 36 - 43
  • [4] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
    Miao, Yajie
    Zhang, Hao
    Metze, Florian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1938 - 1949
  • [5] Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors
    Miao, Yajie
    Zhang, Hao
    Metze, Florian
    [J]. IEEE Transactions on Audio, Speech and Language Processing, 2015, 23 (11): : 1938 - 1949
  • [6] Multi-dialect acoustic modeling using phone mapping and online i-vectors
    Arsikere, Harish
    Sapru, Ashtosh
    Garimella, Sri
    [J]. INTERSPEECH 2019, 2019, : 2125 - 2129
  • [7] An Investigation on the Use of i-vectors for Robust ASR
    Dimitriadis, Dimitrios
    Thomas, Samuel
    Ganapathy, Sriram
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3828 - 3832
  • [8] A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification
    Eghbal-zadeh, Hamid
    Lehner, Bernhard
    Dorfer, Matthias
    Widmer, Gerhard
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2749 - 2753
  • [9] Migrating i-vectors Between Speaker Recognition Systems Using Regression Neural Networks
    Glembek, Ondrej
    Matejka, Pavel
    Plchot, Oldrich
    Pesan, Jan
    Burget, Lukas
    Schwarz, Petr
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2327 - 2331
  • [10] Robust Speaker Verification Using GFCC Based i-Vectors
    Jeevan, Medikonda
    Dhingra, Atul
    Hanmandlu, M.
    Panigrahi, B. K.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL, NETWORKS, COMPUTING, AND SYSTEMS (ICSNCS 2016), VOL 1, 2017, 395 : 85 - 91