Multiple resolution analysis for robust automatic speech recognition

被引：7

作者：

Gemello, R

Mana, F

Albesano, D

De Mori, R

机构：

[1] Loquendo, I-10149 Turin, Italy

[2] Univ Avignon, F-84911 Avignon, France

来源：

COMPUTER SPEECH AND LANGUAGE | 2006年 / 20卷 / 01期

关键词：

D O I：

10.1016/j.csl.2004.06.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper investigates the potential of exploiting the redundancy implicit in multiple resolution analysis for automatic speech recognition systems. The analysis is performed by a binary tree of elements, each one of which is made by a half-band filter followed by a down sampler which discards odd samples. Filter design and feature computation from samples are discussed and recognition performance with different choices is presented. A paradigm consisting in redundant feature extraction, followed by feature normalization, followed by dimensionality reduction is proposed. Feature normalization is performed by denoising algorithms. Two of them are considered and evaluated, namely, signal-to-noise ratio-dependent spectral subtraction and soft thresholding. Dimensionality reduction is performed with principal component analysis. Experiments using telephone corpora and the Aurora3 corpus are reported. They indicate that the proposed paradigm leads to a recognition performance with clean speech, measured in word error rate, marginally superior to the one obtained with perceptual linear prediction coefficients. Nevertheless, performance of the proposed analysis paradigm is significantly superior when used with noisy data and the same denoising algorithm is applied to all the analysis methods, which are compared. (c) 2004 Elsevier Ltd. All rights reserved.

引用

页码：2 / 21

页数：20

共 50 条

[21] Comparative analysis of Dysarthric speech recognition: multiple features and robust templates
Revathi, Arunachalam
Nagakrishnan, R.
Sasikaladevi, N.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 31245 - 31259
[22] Comparative analysis of Dysarthric speech recognition: multiple features and robust templates
Arunachalam Revathi
R. Nagakrishnan
N. Sasikaladevi
[J]. Multimedia Tools and Applications, 2022, 81 : 31245 - 31259
[23] Noise Adaptive Training for Robust Automatic Speech Recognition
Kalinli, Ozlem
Seltzer, Michael L.
Droppo, Jasha
Acero, Alex
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 1889 - 1901
[24] Acoustic quality normalization for robust automatic speech recognition
Muhammad G.
[J]. International Journal of Speech Technology, 2007, 10 (4) : 175 - 182
[25] A Joint Training Framework for Robust Automatic Speech Recognition
Wang, Zhong-Qiu
Wang, DeLiang
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 796 - 806
[26] Robust automatic speech recognition in impulsive noise environment
Ding, P
Cao, ZG
[J]. CHINESE JOURNAL OF ELECTRONICS, 2005, 14 (01): : 165 - 168
[27] CEPSTRAL NOISE SUBTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
Rehr, Robert
Gerkmann, Timo
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 375 - 378
[28] Joint decoding of multiple speech patterns for robust speech recognition
Nair, Nishanth Ulhas
Sreenivas, T. V.
[J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 93 - 98
[29] Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
Moritz, Niko
Adiloglu, Kamil
Anemueller, Joern
Goetze, Stefan
Kollmeier, Birger
[J]. COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 558 - 573
[30] Robust Automatic Speech Recognition System for the Recognition of Continuous Kannada Speech Sentences in the Presence of Noise
[J]. Wireless Personal Communications, 2023, 130 : 2039 - 2058

← 1 2 3 4 5 →