CLOSE-A Data-Driven Approach to Speech Separation

被引:19
|
作者
Ming, Ji [1 ]
Srinivasan, Ramji [1 ]
Crookes, Danny [1 ]
Jafari, Ayeh [1 ]
机构
[1] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2013年 / 21卷 / 07期
基金
英国工程与自然科学研究理事会;
关键词
Co-channel speech; longest matching segment; speaker identification; speech recognition; speech separation; temporal dynamics; MODEL; ENHANCEMENT; RECOGNITION; TRACKING;
D O I
10.1109/TASL.2013.2250959
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.
引用
收藏
页码:1355 / 1368
页数:14
相关论文
共 50 条
  • [41] A data-driven approach to η and η′ Dalitz decays
    Escribano, Rafel
    XIITH QUARK CONFINEMENT AND THE HADRON SPECTRUM, 2017, 137
  • [42] Data-driven part-of-speech tagging of Kiswahili
    De Pauw, Guy
    de Schryver, Gilles-Maurice
    Wagacha, Peter W.
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 197 - 204
  • [43] An Overview of Data-Driven Part-of-Speech Tagging
    Tufis, Dan
    ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2016, 19 (1-2): : 78 - 97
  • [44] Separation of a Mixture of Simultaneous Dual-Tracer PET Signals: A Data-Driven Approach
    Ruan, Dongsheng
    Liu, Huafeng
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 2017, 64 (09) : 2588 - 2597
  • [45] A data-driven deep learning approach for predicting separation-induced transition of submarines
    Xuan, Yang
    Lyu, Hongqiang
    An, Wei
    Liu, Jianhua
    Liu, Xuejun
    PHYSICS OF FLUIDS, 2022, 34 (02)
  • [46] On data-driven choice of λ in nonparametric Gaussian regression via Propagation-Separation approach
    Fiebig, Ewelina Marta
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 154
  • [47] Data-Driven Modeling of Aircraft Midair Separation Violation
    Stover, Oliver
    Mahadevan, Sankaran
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) : 15005 - 15014
  • [48] Private emotions versus social interaction:: a data-driven approach towards analysing emotion in speech
    Batliner, Anton
    Steidl, Stefan
    Hacker, Christian
    Noeth, Elmar
    USER MODELING AND USER-ADAPTED INTERACTION, 2008, 18 (1-2) : 175 - 206
  • [49] A Data-Driven Approach to SAR Data-Focusing
    Guaragnella, Cataldo
    D'Orazio, Tiziana
    SENSORS, 2019, 19 (07):
  • [50] Private emotions versus social interaction: a data-driven approach towards analysing emotion in speech
    Anton Batliner
    Stefan Steidl
    Christian Hacker
    Elmar Nöth
    User Modeling and User-Adapted Interaction, 2008, 18 : 175 - 206