Multiple resolution analysis for robust automatic speech recognition

被引:7
|
作者
Gemello, R
Mana, F
Albesano, D
De Mori, R
机构
[1] Loquendo, I-10149 Turin, Italy
[2] Univ Avignon, F-84911 Avignon, France
来源
COMPUTER SPEECH AND LANGUAGE | 2006年 / 20卷 / 01期
关键词
D O I
10.1016/j.csl.2004.06.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the potential of exploiting the redundancy implicit in multiple resolution analysis for automatic speech recognition systems. The analysis is performed by a binary tree of elements, each one of which is made by a half-band filter followed by a down sampler which discards odd samples. Filter design and feature computation from samples are discussed and recognition performance with different choices is presented. A paradigm consisting in redundant feature extraction, followed by feature normalization, followed by dimensionality reduction is proposed. Feature normalization is performed by denoising algorithms. Two of them are considered and evaluated, namely, signal-to-noise ratio-dependent spectral subtraction and soft thresholding. Dimensionality reduction is performed with principal component analysis. Experiments using telephone corpora and the Aurora3 corpus are reported. They indicate that the proposed paradigm leads to a recognition performance with clean speech, measured in word error rate, marginally superior to the one obtained with perceptual linear prediction coefficients. Nevertheless, performance of the proposed analysis paradigm is significantly superior when used with noisy data and the same denoising algorithm is applied to all the analysis methods, which are compared. (c) 2004 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:2 / 21
页数:20
相关论文
共 50 条
  • [11] Performance Analysis of Hybrid Model of Robust Automatic Continuous Speech Recognition System
    Babu, C. Ganesh
    Sampath, P.
    Hariharan, S.
    Balakumar, S.
    Noufal, Mohamed
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 303 - 306
  • [12] Independent component analysis applied to feature extraction for robust automatic speech recognition
    Potamitis, L
    Fakotakis, N
    Kokkinakis, G
    ELECTRONICS LETTERS, 2000, 36 (23) : 1977 - 1978
  • [13] Comparative Evaluation of Speech Enhancement Methods for Robust Automatic Speech Recognition
    Paliwal, Kuldip K.
    Lyons, James G.
    So, Stephen
    Stark, Anthony P.
    Wojcicki, Kamil K.
    2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,
  • [14] On properties of modulation spectrum for robust automatic speech recognition
    Kanedera, N
    Hermansky, H
    Arai, T
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 613 - 616
  • [15] A Robust Feature Normalization Algorithm for Automatic Speech Recognition
    Lei, Jianjun
    Yang, Zhen
    Wang, Jian
    FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 473 - +
  • [16] Hybrid -task learning for robust automatic speech recognition
    Pironkov, Gueorgui
    Wood, Sean U. N.
    Dupont, Stephane
    COMPUTER SPEECH AND LANGUAGE, 2020, 64
  • [17] Robust automatic speech recognition in the presence of impulsive noise
    Potamitis, I
    Fakotakis, N
    Kokkinakis, G
    ELECTRONICS LETTERS, 2001, 37 (12) : 799 - 800
  • [18] An overview of noise-robust automatic speech recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    IEEE Transactions on Audio, Speech and Language Processing, 2014, 22 (04): : 745 - 777
  • [19] Robust Automatic Speech Recognition for Call Center Applications
    Felipe Parra-Gallego, Luis
    Arias-Vergara, Tomas
    Orozco Arroyave, Juan Rafael
    APPLIED COMPUTER SCIENCES IN ENGINEERING, WEA 2021, 2021, 1431 : 72 - 83
  • [20] An Overview of Noise-Robust Automatic Speech Recognition
    Li, Jinyu
    Deng, Li
    Gong, Yifan
    Haeb-Umbach, Reinhold
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 745 - 777