CASA Based Speech Separation for Robust Speech Recognition

被引：0

作者：

Han Runqiang ^{[1
]}

Zhao Pei ^{[1
]}

Gao Qin ^{[1
]}

Zhang Zhiping ^{[1
]}

Wu Hao ^{[1
]}

Wu Xihong ^{[1
]}

机构：

[1] Peking Univ, Natl Lab Machine Percept, Speech & Hearing Res Ctr, Beijing 100871, Peoples R China

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech separation; CASA; speaker recognition; pitch tracking; units grouping; reconstruction;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces a speech separation system as a front-end processing step for automatic speech recognition (ASR). It employs computational auditory scene analysis (CASA) to separate the target speech from the interference speech. Specifically, the mixed speech is preprocessed based on auditory peripheral model. Then a pitch tracking is conducted and the dominant pitch is used as a main cue to find the target speech. Next, the time frequency (TF) units are merged into many segments. These segments are then combined into streams via CASA initial grouping. A regrouping strategy is employed to refine these streams via amplitude modulate (AM) cues, which are finally organized by the speaker recognition techniques into corresponding speakers. Finally, the output streams are reconstructed to compensate the missing data in the abovementioned processing steps by a cluster based feature reconstruction. Experimental results of ASR show that at low TMR (<-6dB) the proposed method offers significantly higher recognition accuracy.

引用

页码：77 / 80

页数：4

共 50 条

[1] Monaural speech separation based on MAXVQ and CASA for robust speech recognition
Li, Peng
Guan, Yong
Wang, Shijin
Xu, Bo
Liu, Wenju
[J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 30 - 44
[2] Deep Neural Network Based Speech Separation for Robust Speech Recognition
Tu Yanhui
Jun, Du
Xu Yong
Dai Lirong
Chin-Hui, Lee
[J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536
[3] Robust speech recognition by integrating speech separation and hypothesis testing
Srinivasan, S
Wang, DL
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 89 - 92
[4] Robust speech recognition by integrating speech separation and hypothesis testing
Srinivasan, Soundararajan
Wang, DeLiang
[J]. SPEECH COMMUNICATION, 2010, 52 (01) : 72 - 81
[5] Speech Separation and Recognition Using CASA Segmentation and Language-Based Grouping
Karpukhin, Ivan
Konushin, Anton
[J]. ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7650 - 7654
[6] Blind speech separation of nonlinear convolutive mixtures for robust speech recognition
Koutras, A.
Dermatas, E.
Kokkinakis, G.
[J]. Control and Intelligent Systems, 2002, 30 (02) : 83 - 90
[7] A Robust Pitch Extractor Based on DTW Lines and CASA with Application in Noisy Speech Recognition
Morales-Cordovilla, Juan A.
Cabanas-Molero, Pablo
Peinado, Antonio M.
Sanchez, Victoria
[J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, 2012, 328 : 197 - 206
[8] SPEECH SEPARATION FOR SPEECH RECOGNITION
DECHEVEIGNE, A
HAWAHARA, H
AIKAWA, K
LEA, A
[J]. JOURNAL DE PHYSIQUE IV, 1994, 4 (C5): : 545 - 548
[9] SPEECH SEPARATION BASED ON THE IMAGES ANALYSIS METHOD IN CASA
Lin, Jie
Fu, Bo
[J]. 2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 33 - 36
[10] Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition
Narayanan, Arun
Wang, DeLiang
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 826 - 835

← 1 2 3 4 5 →