COMBINATION STRATEGY BASED ON RELATIVE PERFORMANCE MONITORING FOR MULTI-STREAM REVERBERANT SPEECH RECOGNITION

被引:0
|
作者
Xiong, Feifei [1 ,2 ]
Goetze, Stefan [1 ,2 ]
Meyer, Bernd T. [3 ]
机构
[1] Fraunhofer Inst Digital Media Technol IDMT, Project Grp Hearing Speech & Audio Technol HSA, Oldenburg, Germany
[2] Cluster Excellence Hearing4all, Oldenburg, Germany
[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA
关键词
Reverberant speech recognition; multi-stream; posteriors; performance monitoring; weighted system combination;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A multi-stream framework with deep neural network (DNN) classifiers is applied to improve automatic speech recognition (ASR) in environments with different reverberation characteristies. We propose a room parameter estimation model to establish a reliable combination strategy which performs on either DNN posterior probabilities or word lattices. The model is implemented by training a multilayer perceptron incorporating auditory-inspired features in order to distinguish between and generalize to various reverberant conditions, and the model output is shown to be highly correlated to ASR performances between multiple streams, i.e., relative performance monitoring, in contrast to conventional mean temporal distance based performance monitoring for a single stream. Compared to traditional multi-condition training, average relative word error rate improvements of 7.7% and 9.4% have been achieved by the proposed combination strategies performing on posteriors and lattices, respectively, when the multi-stream ASR is tested in known and unknown simulated reverberant environments as weil as realistically recorded conditions taken from REVERB Challenge evaluation set.
引用
收藏
页码:4870 / 4874
页数:5
相关论文
共 50 条
  • [1] ON DNN POSTERIOR PROBABILITY COMBINATION IN MULTI-STREAM SPEECH RECOGNITION FOR REVERBERANT ENVIRONMENTS
    Xiong, Feifei
    Goetze, Stefan
    Meyer, Bernd T.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5250 - 5254
  • [2] Multi-stream Recognition of Noisy Speech with Performance Monitoring
    Variani, Ehsan
    Li, Feipeng
    Hermansky, Hynek
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2977 - 2980
  • [3] Autoencoder based multi-stream combination for noise robust speech recognition
    Mallidi, Sri Harish
    Ogawa, Tetsuji
    Vesely, Karel
    Nidadavolu, Phani S.
    Hermansky, Hynek
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3551 - 3555
  • [4] Multi-stream speech recognition based on Dempster-Shafer combination rule
    Valente, Fabio
    [J]. SPEECH COMMUNICATION, 2010, 52 (03) : 213 - 222
  • [5] Stream fusion for multi-stream automatic speech recognition
    Sagha, Hesam
    Li, Feipeng
    Variani, Ehsan
    Millan, Jose del R.
    Chavarriaga, Ricardo
    Schuller, Bjoern
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 669 - 675
  • [6] Hierarchical multi-stream posterior based speech recognition system
    Ketabdar, H
    Bourlard, H
    Bengio, S
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 294 - 306
  • [7] Multi-stream HMM for EMG-based speech recognition
    Manabe, H
    Zhang, Z
    [J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4389 - 4392
  • [8] Multi-stream parameterization for structural speech recognition
    Asakawa, Satoshi
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4097 - +
  • [9] SUBBAND HYBRID FEATURE FOR MULTI-STREAM SPEECH RECOGNITION
    Li, Feipeng
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [10] Multi-Stream End-to-End Speech Recognition
    Li, Ruizhi
    Wang, Xiaofei
    Mallidi, Sri Harish
    Watanabe, Shinji
    Hori, Takaaki
    Hermansky, Hynek
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655