Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition

被引:0
|
作者
Fukuda, T [1 ]
Nitta, T [1 ]
机构
[1] Toyohashi Univ Technol, Grad Sch Engn, Toyohashi, Aichi 4418580, Japan
来源
关键词
automatic speech recognition (ASR); feature extraction; distinctive phonetic feature (DPF); orthogonalization; local feature (LF);
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a noise-robust automatic speech recognition system that uses orthogonalized distinctive phonetic features (DPFs) as input of HMM with diagonal covariance. In an orthogonalized DPF extraction stage, first, a speech signal is converted to acoustic features composed of local features (LFs) and DeltaP, then a multilayer neural network (MLN) with 15 x 3 output units composed of context-dependent DPFs of a preceding context DPF vector, a current DPF vector, and a following context DPF vector maps the LFs to DPFs. Karhunen-Loeve transform (KLT) is then applied to orthogonalize each DPF vector in the context-dependent DPFs, using orthogonal bases calculated from a DPF vector that represents 38 Japanese phonemes. Each orthogonalized DPF vector is finally decor-related one another by using Gram-Schmidt orthogonalization procedure. related one another by using Gram In experiments, after evaluating the parameters of the MLN input and output units in the DPF extractor. the orthogonalized DPFs are compared with original DPFs. The orthogonalized DPFs are then evaluated in comparison with a standard parameter set of MFCCs and dynamic features. Next, noise robustness is tested using four types of additive noise. The experimental results show that the use of the proposed orthogonalized DPFs can significantly reduce the error rate in an isolated spoken-word recognition task both with clean speech and with speech contaminated by additive noise. Furthermore, we achieved significant improvements when combining the orthogonalized DPFs with conventional static MFCCs and DeltaP.
引用
收藏
页码:1110 / 1118
页数:9
相关论文
共 50 条
  • [21] Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition
    Mahkonen, Katariina
    Hurmalainen, Antti
    Virtanen, Tuomas
    Gemmeke, Jort
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 472 - +
  • [22] A Novel Model Characteristics for Noise-Robust Automatic Speech Recognition Based on HMM
    Rafieee, M. Saadeq
    Khazaei, Ali Akbar
    2010 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND INFORMATION SECURITY (WCNIS), VOL 2, 2010, : 215 - 218
  • [23] Noise-robust automatic speech recognition using a discriminative echo state network
    Skowronski, Mark D.
    Harris, John G.
    2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, : 1771 - 1774
  • [24] EXTENDED VTS FOR NOISE-ROBUST SPEECH RECOGNITION
    van Dalen, R. C.
    Gales, M. J. F.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3829 - 3832
  • [25] Covariance Modelling for Noise-Robust Speech Recognition
    van Dalen, R. C.
    Gales, M. J. F.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2000 - 2003
  • [26] Frame decorrelation for noise-robust speech recognition
    Jung, HY
    Kim, DY
    Un, CK
    ELECTRONICS LETTERS, 1996, 32 (13) : 1163 - 1164
  • [27] Frame decorrelation for noise-robust speech recognition
    Korea Advanced Inst of Science and, Technology, Taejon, Korea, Republic of
    Electron Lett, 13 (1163-1164):
  • [28] Extended VTS for Noise-Robust Speech Recognition
    van Dalen, Rogier C.
    Gales, Mark J. F.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 733 - 743
  • [29] A robust feature extraction for automatic speech recognition in noisy environments
    Lima, C
    Almeida, LB
    Monteiro, JL
    2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 540 - 543
  • [30] Physiologically Motivated Feature Extraction for Robust Automatic Speech Recognition
    Missaoui, Ibrahim
    Lachiri, Zied
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (04) : 297 - 301