Normalizing the speech modulation spectrum for robust speech recognition

被引：0

作者：

Xiao, Xiong ^{[1
,2
]}

Chng, Eng Siong ^{[1
]}

Li, Haizhou ^{[1
,2
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore

[2] Inst Infocomm Res, Singapore, Singapore

来源：

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年

关键词：

speech recognition; feature normalization; modulation spectrum; square-root Wiener filter; temporal filter;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a novel feature normalization technique for robust speech recognition. The proposed technique normalizes the temporal structure of the feature to reduce the feature variation due to environmental interferences. Specifically, it normalizes the utterance-dependent feature modulation spectrum to a reference function by filtering the feature using a square-root Wiener filter in the temporal domain. We show experimentally that the proposed technique when combined with mean and variance normalization technique (MVN) reduces the word error rate significantly on the AURORA-2 task, with relative error rate reduction 69.11% compared to the base me.

引用

页码：1021 / +

页数：2

共 50 条

[21] Spectrum filtering with FRM for robust speech recognition
Hayasaka, Noboru
Miyanaga, Yoshikazu
[J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 3285 - +
[22] Modulation spectrum analysis for recognition of reverberant speech
Mallidi, Sri Harish
Ganapathy, Sriram
Hermansky, Hynek
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 196 - 199
[23] Static and Dynamic Modulation Spectrum for Speech Recognition
Ganapathy, Sriram
Thomas, Samuel
Hermansky, Hynek
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2791 - 2794
[24] Speech feature extraction based on wavelet modulation scale for robust speech recognition
Ma, Xin
Zhou, Weidong
Ju, Fang
Jiang, Qi
[J]. NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 499 - 505
[25] Overlapped sub-band modulation spectrum normalization techniques for robust speech recognition
Fan, Hao-teng
Yeh, Wei-jeih
Hung, Jeih-weih
[J]. 2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, : 1035 - 1039
[26] Quality-Aware Bag of Modulation Spectrum Features for Robust Speech Emotion Recognition
Kshirsagar, Shruti Rajendra
Falk, Tiago Henrik
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2022, 13 (04) : 1892 - 1905
[27] Direct control on modulation spectrum for noise-robust speech recognition and spectral subtraction
Wada, Naoya
Hayasaka, Noboru
Yoshizawa, Shingo
Miyanaga, Yoshikazu
[J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 2533 - +
[28] A robust speech analysis in speech recognition
Miyanaga, Y
Gozen, S
Ohtsuki, N
[J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 706 - 709
[29] MINIMUM VARIANCE MODULATION FILTER FOR ROBUST SPEECH RECOGNITION
Chiu, Yu-Hsiang Bosco
Stern, Richard M.
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3917 - +
[30] Temporal Modulation Spectral Restoration for Robust Speech Recognition
Wang, Svu-Siang
Tsao, Yu
[J]. 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 481 - 486

← 1 2 3 4 5 →