Unsupervised modulation filter learning for noise-robust speech recognition

被引：11

作者：

Agrawal, Purvi ^{[1
]}

Ganapathy, Sriram ^{[1
]}

机构：

[1] Indian Inst Sci, Bangalore, Karnataka, India

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 2017年 / 142卷 / 03期

关键词：

FEATURES;

D O I：

10.1121/1.5001926

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The modulation filtering approach to robust automatic speech recognition (ASR) is based on enhancing perceptually relevant regions of the modulation spectrum while suppressing the regions susceptible to noise. In this paper, a data- driven unsupervised modulation filter learning scheme is proposed using convolutional restricted Boltzmann machine. The initial filter is learned using the speech spectrogram while subsequent filters are learned using residual spectrograms. The modulation filtered spectrograms are used for ASR experiments on noisy and reverberant speech where these features provide significant improvements over other robust features. Furthermore, the application of the proposed method for semi- supervised learning is investigated. (C) 2017 Acoustical Society of America.

引用

页码：1686 / 1692

页数：7

共 50 条

[1] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
Sara Ahmadi
Seyed Mohammad Ahadi
Bert Cranen
Lou Boves
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014
[2] Sparse coding of the modulation spectrum for noise-robust automatic speech recognition
Ahmadi, Sara
Ahadi, Seyed Mohammad
Cranen, Bert
Boves, Lou
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, : 1 - 20
[3] Noise-Robust speech recognition of Conversational Telephone Speech
Chen, Gang
Tolba, Hesham
O'Shaughnessy, Douglas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1101 - 1104
[4] Unsupervised learning of time-frequency patches as a noise-robust representation of speech
Van Segbroeck, Maarten
Van Hamme, Hugo
[J]. SPEECH COMMUNICATION, 2009, 51 (11) : 1124 - 1138
[5] EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
van Dalen, R. C.
Gales, M. J. F.
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3829 - 3832
[6] Covariance Modelling for Noise-Robust Speech Recognition
van Dalen, R. C.
Gales, M. J. F.
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003
[7] An Overview of Noise-Robust Automatic Speech Recognition
Li, Jinyu
Deng, Li
Gong, Yifan
Haeb-Umbach, Reinhold
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777
[8] Noise-robust Attention Learning for End-to-End Speech Recognition
Higuchi, Yosuke
Tawara, Naohiro
Ogawa, Atsunori
Iwata, Tomoharu
Kobayashi, Tetsunori
Ogawa, Tetsuji
[J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 311 - 315
[9] Extended VTS for Noise-Robust Speech Recognition
van Dalen, Rogier C.
Gales, Mark J. F.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 733 - 743
[10] Frame decorrelation for noise-robust speech recognition
Jung, HY
Kim, DY
Un, CK
[J]. ELECTRONICS LETTERS, 1996, 32 (13) : 1163 - 1164

← 1 2 3 4 5 →