COMBINATION STRATEGY BASED ON RELATIVE PERFORMANCE MONITORING FOR MULTI-STREAM REVERBERANT SPEECH RECOGNITION

被引：0

作者：

Xiong, Feifei ^{[1
,2
]}

Goetze, Stefan ^{[1
,2
]}

Meyer, Bernd T. ^{[3
]}

机构：

[1] Fraunhofer Inst Digital Media Technol IDMT, Project Grp Hearing Speech & Audio Technol HSA, Oldenburg, Germany

[2] Cluster Excellence Hearing4all, Oldenburg, Germany

[3] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

Reverberant speech recognition; multi-stream; posteriors; performance monitoring; weighted system combination;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A multi-stream framework with deep neural network (DNN) classifiers is applied to improve automatic speech recognition (ASR) in environments with different reverberation characteristies. We propose a room parameter estimation model to establish a reliable combination strategy which performs on either DNN posterior probabilities or word lattices. The model is implemented by training a multilayer perceptron incorporating auditory-inspired features in order to distinguish between and generalize to various reverberant conditions, and the model output is shown to be highly correlated to ASR performances between multiple streams, i.e., relative performance monitoring, in contrast to conventional mean temporal distance based performance monitoring for a single stream. Compared to traditional multi-condition training, average relative word error rate improvements of 7.7% and 9.4% have been achieved by the proposed combination strategies performing on posteriors and lattices, respectively, when the multi-stream ASR is tested in known and unknown simulated reverberant environments as weil as realistically recorded conditions taken from REVERB Challenge evaluation set.

引用

页码：4870 / 4874

页数：5

共 50 条

[1] ON DNN POSTERIOR PROBABILITY COMBINATION IN MULTI-STREAM SPEECH RECOGNITION FOR REVERBERANT ENVIRONMENTS
Xiong, Feifei
Goetze, Stefan
Meyer, Bernd T.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5250 - 5254
[2] Multi-stream Recognition of Noisy Speech with Performance Monitoring
Variani, Ehsan
Li, Feipeng
Hermansky, Hynek
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2977 - 2980
[3] Autoencoder based multi-stream combination for noise robust speech recognition
Mallidi, Sri Harish
Ogawa, Tetsuji
Vesely, Karel
Nidadavolu, Phani S.
Hermansky, Hynek
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3551 - 3555
[4] Multi-stream speech recognition based on Dempster-Shafer combination rule
Valente, Fabio
[J]. SPEECH COMMUNICATION, 2010, 52 (03) : 213 - 222
[5] Stream fusion for multi-stream automatic speech recognition
Sagha, Hesam
Li, Feipeng
Variani, Ehsan
Millan, Jose del R.
Chavarriaga, Ricardo
Schuller, Bjoern
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 669 - 675
[6] Hierarchical multi-stream posterior based speech recognition system
Ketabdar, H
Bourlard, H
Bengio, S
[J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 294 - 306
[7] Multi-stream HMM for EMG-based speech recognition
Manabe, H
Zhang, Z
[J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4389 - 4392
[8] Multi-stream parameterization for structural speech recognition
Asakawa, Satoshi
Minematsu, Nobuaki
Hirose, Keikichi
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4097 - +
[9] SUBBAND HYBRID FEATURE FOR MULTI-STREAM SPEECH RECOGNITION
Li, Feipeng
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[10] Multi-Stream End-to-End Speech Recognition
Li, Ruizhi
Wang, Xiaofei
Mallidi, Sri Harish
Watanabe, Shinji
Hori, Takaaki
Hermansky, Hynek
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (646-655) : 646 - 655

← 1 2 3 4 5 →