JOINT OPTIMIZATION OF NEURAL NETWORK-BASED WPE DEREVERBERATION AND ACOUSTIC MODEL FOR ROBUST ONLINE ASR

被引:0
|
作者
Heymann, Jahn [1 ]
Drude, Lukas [1 ]
Haeb-Umbach, Reinhold [1 ]
Kinoshita, Keisuke [2 ]
Nakatani, Tomohiro [2 ]
机构
[1] Paderborn Univ, Dept Commun Engn, Paderborn, Germany
[2] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
关键词
dereverberation; speech enhancement; joint optimization; robust ASR;
D O I
10.1109/icassp.2019.8683294
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Signal dereverberation using the Weighted Prediction Error ( WPE) method has been proven to be an effective means to raise the accuracy of far-field speech recognition. First proposed as an iterative algorithm, follow-up works have reformulated it as a recursive least squares algorithm and therefore enabled its use in online applications. For this algorithm, the estimation of the power spectral density ( PSD) of the anechoic signal plays an important role and strongly influences its performance. Recently, we showed that using a neural network PSD estimator leads to improved performance for online automatic speech recognition. This, however, comes at a price. To train the network, we require parallel data, i. e., utterances simultaneously available in clean and reverberated form. Here we propose to overcome this limitation by training the network jointly with the acoustic model of the speech recognizer. To be specific, the gradients computed from the cross-entropy loss between the target senone sequence and the acoustic model network output is backpropagated through the complex-valued dereverberation filter estimation to the neural network for PSD estimation. Evaluation on two databases demonstrates improved performance for online processing scenarios while imposing fewer requirements on the available training data and thus widening the range of applications.
引用
收藏
页码:6655 / 6659
页数:5
相关论文
共 50 条
  • [1] Neural network-based spectrum estimation for online WPE dereverberation
    Kinoshita, Keisuke
    Delcroix, Marc
    Kwon, Haeyong
    Mori, Takuma
    Nakatani, Tomohiro
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 384 - 388
  • [2] Joint optimization of neural acoustic beamforming and dereverberation with x-vectors for robust speaker verification
    Yang, Joon-Young
    Chang, Joon-Hyuk
    [J]. INTERSPEECH 2019, 2019, : 4075 - 4079
  • [3] Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments
    Noh, Kyoungjin
    Chang, Joon-Hyuk
    [J]. SENSORS, 2020, 20 (07)
  • [4] Robust Speech Recognition Based on Dereverberation Parameter Optimization Using Acoustic Model Likelihood
    Gomez, Randy
    Kawahara, Tatsuya
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07): : 1708 - 1716
  • [5] HARMONIC FEATURE FUSION FOR ROBUST NEURAL NETWORK-BASED ACOUSTIC MODELING
    Ichikawa, Osamu
    Fukuda, Takashi
    Suzuki, Masayuki
    Kurata, Gakuto
    Ramabhadran, Bhuvana
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5195 - 5199
  • [6] Rank-weighted reconstruction feature for a robust deep neural network-based acoustic model
    Chung, Hoon
    Park, Jeon Gue
    Jung, Ho-Young
    [J]. ETRI JOURNAL, 2019, 41 (02) : 235 - 241
  • [7] Deep Neural Network-Based Noise Estimation for Robust ASR in Dual-Microphone Smartphones
    Lopez-Espejo, Ivan
    Peinado, Antonio M.
    Gomez, Angel M.
    Martin-Donas, Juan M.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 117 - 127
  • [8] Neural Network-Based Beam Pumper Model Optimization
    Feng, Dehua
    Qi, Yaoguang
    Yu, Yanqun
    Zhu, Hongying
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [9] JOINT TRAINING OF COMPLEX RATIO MASK BASED BEAMFORMER AND ACOUSTIC MODEL FOR NOISE ROBUST ASR
    Xu, Yong
    Weng, Chao
    Hui, Like
    Liu, Jianming
    Yu, Meng
    Su, Dan
    Yu, Dong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6745 - 6749
  • [10] Neural network-based optimization model for sprint data collection
    Luo, Guangfei
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2022, 22 (01) : 253 - 263