Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition

被引:0
|
作者
Fukuda, T [1 ]
Nitta, T [1 ]
机构
[1] Toyohashi Univ Technol, Grad Sch Engn, Toyohashi, Aichi 4418580, Japan
来源
关键词
automatic speech recognition (ASR); feature extraction; distinctive phonetic feature (DPF); orthogonalization; local feature (LF);
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a noise-robust automatic speech recognition system that uses orthogonalized distinctive phonetic features (DPFs) as input of HMM with diagonal covariance. In an orthogonalized DPF extraction stage, first, a speech signal is converted to acoustic features composed of local features (LFs) and DeltaP, then a multilayer neural network (MLN) with 15 x 3 output units composed of context-dependent DPFs of a preceding context DPF vector, a current DPF vector, and a following context DPF vector maps the LFs to DPFs. Karhunen-Loeve transform (KLT) is then applied to orthogonalize each DPF vector in the context-dependent DPFs, using orthogonal bases calculated from a DPF vector that represents 38 Japanese phonemes. Each orthogonalized DPF vector is finally decor-related one another by using Gram-Schmidt orthogonalization procedure. related one another by using Gram In experiments, after evaluating the parameters of the MLN input and output units in the DPF extractor. the orthogonalized DPFs are compared with original DPFs. The orthogonalized DPFs are then evaluated in comparison with a standard parameter set of MFCCs and dynamic features. Next, noise robustness is tested using four types of additive noise. The experimental results show that the use of the proposed orthogonalized DPFs can significantly reduce the error rate in an isolated spoken-word recognition task both with clean speech and with speech contaminated by additive noise. Furthermore, we achieved significant improvements when combining the orthogonalized DPFs with conventional static MFCCs and DeltaP.
引用
收藏
页码:1110 / 1118
页数:9
相关论文
共 50 条
  • [31] Knowledge Distillation-Based Training of Speech Enhancement for Noise-Robust Automatic Speech Recognition
    Woo Lee, Geon
    Kook Kim, Hong
    Kong, Duk-Jo
    IEEE ACCESS, 2024, 12 : 72707 - 72720
  • [32] A missing data-based feature fusion strategy for noise-robust automatic speech recognition using noisy sensors
    Demiroglu, Cenk
    Anderson, David V.
    Clements, Mark. A.
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 965 - 968
  • [33] NON-NEGATIVE MATRIX FACTORIZATION AS NOISE-ROBUST FEATURE EXTRACTOR FOR SPEECH RECOGNITION
    Schuller, Bjoern
    Weninger, Felix
    Woellmer, Martin
    Sun, Yang
    Rigoll, Gerhard
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4562 - 4565
  • [34] Noise-Robust Speech Recognition Through Auditory Feature Detection and Spike Sequence Decoding
    Schafer, Phillip B.
    Jin, Dezhe Z.
    NEURAL COMPUTATION, 2014, 26 (03) : 523 - 556
  • [35] An Efficient Noise-Robust Automatic Speech Recognition System using Artificial Neural Networks
    Gupta, Santosh
    Bhurchandi, Kishor M.
    Keskar, Avinash G.
    2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1873 - 1877
  • [36] EXPLOITING SYNCHRONY SPECTRA AND DEEP NEURAL NETWORKS FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Ma, Ning
    Marxer, Ricard
    Barker, Jon
    Brown, Guy J.
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 490 - 495
  • [37] An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition
    Bhiksha Raj
    Lorenzo Turicchia
    Bent Schmidt-Nielsen
    Rahul Sarpeshkar
    EURASIP Journal on Audio, Speech, and Music Processing, 2007
  • [38] An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition
    Raj, Bhiksha
    Turicchia, Lorenzo
    Schmidt-Nielsen, Bent
    Sarpeshkar, Rahul
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2007, 2007 (1)
  • [39] An engineering model of the masking for the noise-robust speech recognition
    Park, KY
    Lee, SY
    NEUROCOMPUTING, 2003, 52-4 : 615 - 620
  • [40] Noise-Robust Feature Extraction Based on Forward Masking
    Chiou, Sheng-Chiuan
    Chen, Chia-Ping
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1243 - 1246