ENHANCED POWER-NORMALIZED FEATURES FOR MANDARIN ROBUST SPEECH RECOGNITION BASED ON A VOICED-UNVOICED-SILENCE DECISION

被引：0

作者：

Tan, Ying-Wei ^{[1
]}

Liu, Wen-Ju ^{[1
]}

Yang, Zhen-Lei ^{[1
]}

Chen, Ming-Ming ^{[1
]}

机构：

[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing, Peoples R China

来源：

2014 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (CHINASIP) | 2014年

关键词：

Mandarin robust speech recognition; a voiced-unvoiced-silence decision; enhanced power-normalized features; a weighted harmonic-noise-model; GAMMATONE FILTERBANK;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Power-normalized features have been shown to improve the performance of English large vocabulary continuous speech recognition under different acoustic conditions. In this paper, considering tone characteristics of Mandarin speech, we adopt different strategies to deal with different sounds based on a voiced-unvoiced-silence decision of sounds. For voiced sounds, harmonic enhancement based on a weighted harmonic-noise-model (WHNM) is applied to accurately capture the salient harmonic information and decreases the effect of various non-stationary noises. After this, standard power-normalized processing (SPNP) is performed. For unvoiced sounds, the SPNP is only used. For silence sounds, an quality frame dropping (FD) algorithm is incorporated into the front-end properly. As a result, enhanced powernormalized features are obtained and used to process noisecorrupted Mandarin speech. The experimental results show better recognition accuracies for Mandarin continuous speech recognition in noisy environments over the ETSI Advanced Front-End (AFE).

引用

页码：222 / 226

页数：5

共 12 条

[1] Voiced-Unvoiced-Silence Classifications of Speech Using Hybrid Features and a Network Classifier
Qi, Yingyong
Hunt, Bobby R.
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 250 - 255
[2] Voiced-unvoiced-silence speech sound classification based on unsupervised learning
Deng, Huiqun
O'Shaughnessy, Douglas
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 176 - 179
[3] Clustering based Voiced-Unvoiced-Silence Detection in Speech using Temporal and Spectral Parameters
Mondal, Sujoy
Das Barman, Abhirup
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2015, : 390 - 394
[4] New method of voiced-unvoiced-silence classifications based on the graph templet matching of speech waveform
[J]. 2000, Journal of Nanjing Institute of Posts and Telecommunications, China (20):
[5] Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
Kim, Chanwoo
Stern, Richard M.
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1315 - 1329
[6] POWER-NORMALIZED CEPSTRAL COEFFICIENTS (PNCC) FOR ROBUST SPEECH RECOGNITION
Kim, Chanwoo
Stern, Richard M.
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4101 - 4104
[7] POWER-NORMALIZED PLP (PNPLP) FEATURE FOR ROBUST SPEECH RECOGNITION
Fan, Lichun
Ke, Dengfeng
Fu, Xiaoyin
Lu, Shixiang
Xu, Bo
[J]. 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 224 - 228
[8] Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients
Tamazin, Mohamed
Gouda, Ahmed
Khedr, Mohamed
[J]. APPLIED SCIENCES-BASEL, 2019, 9 (10):
[9] Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision
O'Shaughnessy, D
Tolba, H
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 413 - 416
[10] ROBUST AUDIO-VISUAL MANDARIN SPEECH RECOGNITION BASED ON ADAPTIVE DECISION FUSION AND TONE FEATURES
Liu, Hong
Chen, Zhengyan
Shi, Wei
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1381 - 1385

← 1 2 →