Model-based sequential organization in cochannel speech

被引：50

作者：

Shao, Y ^{[1
]}

Wang, DL

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 01期

关键词：

auditory scene analysis; cochannel speech; model-based approach; sequential organization; speaker identification (SID); usable speech;

D O I：

10.1109/TSA.2005.854106

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A human listener has the ability to follow a speaker's voice while others are speaking simultaneously; in particular, the listener can organize the time-frequency energy of the same speaker across time into a single stream. In this paper, we focus on sequential organization in cochannel speech, or mixtures of two voices. We extract minimally corrupted segments, or usable speech, in cochannel speech using a robust multipitch tracking algorithm. The extracted, usable speech is shown to capture speaker characteristics and improves speaker identification (SID) performance across various target-to-interferer ratios. To utilize speaker characteristics for sequential organization, we extend the traditional SID framework to cochannel speech and derive a joint objective for sequential grouping and SID, leading to a problem of search for the optimum hypothesis. Subsequently we propose a hypothesis pruning algorithm based on speaker models in order to make the search computationally efficient. Evaluation results show that the proposed system approaches the ceiling SID performance obtained with prior pitch information and yields significant improvement over alternative approaches to sequential organization.

引用

页码：289 / 298

页数：10

共 50 条

[1] Unsupervised sequential organization for cochannel speech separation
Hu, Ke
Wang, DeLiang
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2794 - +
[2] An iterative model-based approach to cochannel speech separation
Hu, Ke
Wang, DeLiang
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
[3] An iterative model-based approach to cochannel speech separation
Ke Hu
DeLiang Wang
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
[4] AN APPROACH TO SEQUENTIAL GROUPING IN COCHANNEL SPEECH
Hu, Ke
Wang, DeLiang
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4636 - 4639
[5] Cochannel Speech Separation Using Multi-pitch Estimation and Model Based Voiced Sequential Grouping
Li, Ming
Cao, Chuan
Wang, Di
Lu, Ping
Fu, Qiang
Yan, Yonghong
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 151 - 154
[6] Sequential Model-Based Ensemble Optimization
Lacoste, Alexandre
Larochelle, Hugo
Marchand, Mario
Laviolette, Francois
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2014, : 440 - 448
[7] Model-based design of speech interfaces
Berti, S
Paternò, F
[J]. INTERACTIVE SYSTEMS: DESIGN, SPECIFICATION, AND VERIFICATION, 2003, 2844 : 231 - 244
[8] INDIRECT MODEL-BASED SPEECH ENHANCEMENT
Le Roux, Jonathan
Hershey, John R.
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4045 - 4048
[9] Adaptive model-based speech enhancement
Logan, B
Robinson, T
[J]. SPEECH COMMUNICATION, 2001, 34 (04) : 351 - 368
[10] A model-based approach to sequential fault diagnosis
Pietersma, Jurryt
van Gemund, Arjan J. C.
Bos, Andre
[J]. AUTOTESTCON 2005, 2005, : 621 - 627

← 1 2 3 4 5 →