Monaural speech segregation based on fusion of source-driven with model-driven techniques

被引：19

作者：

Radfar, Mohammad H.

Dansereau, Richard M.

Sayadiyan, Abolghasem

机构：

[1] Carleton Univ, Dept Syst & Comp Engn, Ottawa, ON K1S 5B6, Canada

[2] Amirkabir Univ Technol, Dept Elect Engn, Tehran 15875 4413, Iran

来源：

SPEECH COMMUNICATION | 2007年 / 49卷 / 06期

基金：

加拿大自然科学与工程研究理事会;

关键词：

speech processing; monaural speech segregation; CASA; speech coding; harmonic modelling; vector quantization; envelope extraction; multi-pitch tracking; MIXMAX estimator;

D O I：

10.1016/j.specom.2007.04.007

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper by exploiting the prevalent methods in speech coding and synthesis, a new single channel speech segregation technique is presented. The technique integrates a model-driven method with a source-driven method to take advantage of both individual approaches and reduce their pitfalls significantly. We apply harmonic modelling in which the pitch and spectrum envelope are the main components for the analysis and synthesis stages. Pitch values of two speakers are obtained by using a source-driven method. The spectrum envelope, is obtained by using a new model-driven technique consisting of four components: a trained codebook of the vector quantized envelopes (VQ-based separation), a mixture-maximum approximation (MIXMAX), minimum mean square error estimator (MMSE), and a harmonic synthesizer. In contrast with previous model-driven techniques, this approach is speaker independent and can separate out the unvoiced regions as well as suppress the crosstalk effect which both are the drawbacks of source-driven or equivalently computational auditory scene analysis (CASA) models. We compare our fused model with both model- and source-driven techniques by conducting subjective and objective experiments. The results show that although for the speaker-dependent case, model-based separation delivers the best quality, for a speaker independent scenario the integrated model outperforms the individual approaches. This result supports the idea that the human auditory system takes on both grouping cues (e.g., pitch tracking) and a priori knowledge (e.g., trained quantized envelopes) to segregate speech signals. (C) 2007 Elsevier B.V. All rights reserved.

引用

页码：464 / 476

页数：13

共 50 条

[1] Psychoacoustic model-driven spectral subtraction for monaural speech enhancement
Upadhyay N.
International Journal of Speech Technology, 2023, 26 (04) : 963 - 979
[2] Responsivity enhancement techniques for CMOS source-driven terahertz detectors
Wu, Hao
Fu, Haipeng
Meng, Fanyi
Ma, Kaixue
MICROWAVE AND OPTICAL TECHNOLOGY LETTERS, 2022, 64 (06) : 1036 - 1041
[3] A Survey of Model-Driven Testing Techniques
Mussa, Mohamed
Ouchani, Samir
Al Sammane, Waseem
Hamou-Lhadj, Abdelwahab
2009 NINTH INTERNATIONAL CONFERENCE ON QUALITY SOFTWARE (QSIC 2009), 2009, : 167 - 172
[4] An Overview of Data-Driven and Model-Driven Based Prognostics Techniques for Power Modules
Halim, M. H. Abdul
Buniyamin, N.
Naoe, N.
Rosman, M. S.
2018 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND SYSTEM ENGINEERING (ICEESE), 2018, : 34 - 39
[5] Integral parameters in source-driven systems
Dulla, Sandra
Picca, Paolo
Ravetto, Piero
Tomatis, Daniele
Carta, Mario
PROGRESS IN NUCLEAR ENERGY, 2011, 53 (01) : 32 - 40
[6] Empirical Evaluation of UML-based Model-Driven Techniques
Leotta, Maurizio
Ricca, Filippo
Torchiano, Marco
Reggio, Gianna
2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2013,
[7] A product derivation tool based on model-driven techniques and annotations
Cirilo, Elder
Kulesza, Uira
Pereira de Lucena, Carlos Jose
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2008, 14 (08) : 1344 - 1367
[8] A PV Power Forecasting Based on Mechanism Model-Driven and Stacking Model Fusion
Chen, Fan
Ding, Jinjin
Zhang, Qian
Wu, Junjie
Lei, Fan
Liu, Yifan
JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2024, 19 (08) : 4683 - 4697
[9] Electric quadrupolarizability of a source-driven dielectric sphere
Electromagnetics Research Consultant, 115 Wright Road, Concord
MA
01742, United States
不详
3030-290, Portugal
不详
15875-4431, Iran
不详
TX
78712, United States
Prog. Electromagn. Res. B, 1 (95-106):
[10] Model-driven detection of clean speech patches in noise
Laidler, Jonathan
Cooke, Martin
Lawrence, Neil D.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1677 - +

← 1 2 3 4 5 →