CLOSE-A Data-Driven Approach to Speech Separation

被引：19

作者：

Ming, Ji ^{[1
]}

Srinivasan, Ramji ^{[1
]}

Crookes, Danny ^{[1
]}

Jafari, Ayeh ^{[1
]}

机构：

[1] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast BT7 1NN, Antrim, North Ireland

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 07期

基金：

英国工程与自然科学研究理事会;

关键词：

Co-channel speech; longest matching segment; speaker identification; speech recognition; speech separation; temporal dynamics; MODEL; ENHANCEMENT; RECOGNITION; TRACKING;

D O I：

10.1109/TASL.2013.2250959

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.

引用

页码：1355 / 1368

页数：14

共 50 条

[41] A data-driven approach to η and η′ Dalitz decays
Escribano, Rafel
XIITH QUARK CONFINEMENT AND THE HADRON SPECTRUM, 2017, 137
[42] Data-driven part-of-speech tagging of Kiswahili
De Pauw, Guy
de Schryver, Gilles-Maurice
Wagacha, Peter W.
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 197 - 204
[43] An Overview of Data-Driven Part-of-Speech Tagging
Tufis, Dan
ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2016, 19 (1-2): : 78 - 97
[44] Separation of a Mixture of Simultaneous Dual-Tracer PET Signals: A Data-Driven Approach
Ruan, Dongsheng
Liu, Huafeng
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2017, 64 (09) : 2588 - 2597
[45] A data-driven deep learning approach for predicting separation-induced transition of submarines
Xuan, Yang
Lyu, Hongqiang
An, Wei
Liu, Jianhua
Liu, Xuejun
PHYSICS OF FLUIDS, 2022, 34 (02)
[46] On data-driven choice of λ in nonparametric Gaussian regression via Propagation-Separation approach
Fiebig, Ewelina Marta
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 154
[47] Data-Driven Modeling of Aircraft Midair Separation Violation
Stover, Oliver
Mahadevan, Sankaran
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) : 15005 - 15014
[48] Private emotions versus social interaction:: a data-driven approach towards analysing emotion in speech
Batliner, Anton
Steidl, Stefan
Hacker, Christian
Noeth, Elmar
USER MODELING AND USER-ADAPTED INTERACTION, 2008, 18 (1-2) : 175 - 206
[49] A Data-Driven Approach to SAR Data-Focusing
Guaragnella, Cataldo
D'Orazio, Tiziana
SENSORS, 2019, 19 (07):
[50] Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech
Anton Batliner
Stefan Steidl
Christian Hacker
Elmar Nöth
User Modeling and User-Adapted Interaction, 2008, 18 : 175 - 206

← 1 2 3 4 5 →