Region Dependent Transform on MLP Features for Speech Recognition

被引：0

作者：

Ng, Tim ^{[1
]}

Zhang, Bing ^{[1
]}

Matsoukas, Spyros ^{[1
]}

Long Nguyen ^{[1
]}

机构：

[1] Raytheon BBN Technol, Cambridge, MA 02138 USA

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

Multi-Layer Perceptrons; bottleneck features; Region Dependent Transform; discriminative training; Mandarin speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, Region Dependent Transform (RDT) is used as a feature extraction process to combine the traditional short-term acoustic features with the features derived from Multi-Layer Perceptrons (MLP) which is trained from the long-term features. When compared to the conventional feature augmentation approach, substantial improvement is obtained. Moreover, an improved RDT training procedure in which speaker dependent transforms are take into account is proposed for feature combinination in the Speaker Adaptive Training. By incorporating the higher dimensional features output from the layer prior to the bottleneck layer into our Speech-to-Text (SIT) system using RDT, significant improvement is achieved as compared to using the conventional bottleneck features. In summary, by using the features derived from MLP with RDT, 8.2% to 11.4% relative reduction in Character Error Rate is achieved for our Mandarin STT systems.

引用

页码：228 / 231

页数：4

共 50 条

[21] Investigating Low-Distortion Speech Enhancement with Discrete Cosine Transform Features for Robust Speech Recognition
Tsao, Yu-Sheng
Hung, Jeih-Weih
Ho, Kuan-Hsun
Chen, Berlin
[J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 131 - 136
[22] Applying dynamic context into MLP/HMM speech recognition system
Salmela, P
[J]. COMPUTER SPEECH AND LANGUAGE, 2000, 15 (03): : 233 - 255
[23] An HMM/MLP hybrid approach for improving discrimination in speech recognition
Na, K
Chae, SI
[J]. IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 156 - 159
[24] Speaker-Dependent Bottleneck Features for Egyptian Arabic Speech Recognition
Romanenko, Aleksei
Mendelev, Valentin
[J]. SPEECH AND COMPUTER, 2016, 9811 : 620 - 626
[25] MLP-BASED FACTOR ANALYSIS FOR TANDEM SPEECH RECOGNITION
Ferras, Marc
Bourlard, Herve
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6719 - 6723
[26] A study on recognition of speech based on HMM/MLP hybrid network
Huang, XY
Ma, XH
Li, X
Fu, YQ
Lu, JR
[J]. 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 718 - 721
[27] Recent Progress on the Discriminative Region-dependent Transform for Speech Feature Extraction
Zhang, Bing
Matsoukas, Spyros
Schwartz, Richard
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1495 - +
[28] Wavelet Transform Based Features Vector Extraction in Isolated Words Speech Recognition System
Al-Qaraawi, Salih M.
Mahmood, Sarah Shukur
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON COMMUNICATION SYSTEMS, NETWORKS & DIGITAL SIGNAL PROCESSING (CSNDSP), 2014, : 847 - 850
[29] Acoustic features based on auditory model and adaptive fractional Fourier transform for speech recognition
YIN Hui XIE Xiang~+ KUANG Jingming (Department of Electronic Engineering
[J]. Chinese Journal of Acoustics, 2011, 30 (04) : 453 - 463
[30] Acoustic features based on auditory model and adaptive fractional Fourier transform for speech recognition
[J]. Yin, H, 1600, Science Press (37):

← 1 2 3 4 5 →