CLOSE-A Data-Driven Approach to Speech Separation

被引:19
|
作者
Ming, Ji [1 ]
Srinivasan, Ramji [1 ]
Crookes, Danny [1 ]
Jafari, Ayeh [1 ]
机构
[1] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
基金
英国工程与自然科学研究理事会;
关键词
Co-channel speech; longest matching segment; speaker identification; speech recognition; speech separation; temporal dynamics; MODEL; ENHANCEMENT; RECOGNITION; TRACKING;
D O I
10.1109/TASL.2013.2250959
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.
引用
收藏
页码:1355 / 1368
页数:14
相关论文
共 50 条
  • [1] Data-driven analysis of speech
    Hermansky, H
    [J]. TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 10 - 18
  • [2] Prediction of pronunciation variations for speech synthesis: A data-driven approach
    Bennett, CL
    Black, AW
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 297 - 300
  • [3] Data-driven environmental compensation for speech recognition: A unified approach
    Moreno, PJ
    Raj, B
    Stern, RM
    [J]. SPEECH COMMUNICATION, 1998, 24 (04) : 267 - 285
  • [4] Data-driven techniques in speech synthesis
    Dutoit, T
    [J]. COMPUTATIONAL LINGUISTICS, 2002, 28 (04) : 570 - 572
  • [5] Data-driven approach to designing compound words for continuous speech recognition
    Saon, G
    Padmanabhan, M
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (04): : 327 - 332
  • [6] A DATA-DRIVEN APPROACH FOR ACOUSTIC PARAMETER SIMILARITY ESTIMATION OF SPEECH RECORDING
    Papa, Mattia
    Borrelli, Clara
    Bestagini, Paolo
    Antonacci, Fabio
    Sarti, Augusto
    Tubaro, Stefano
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 596 - 600
  • [7] A DATA-DRIVEN RESIDUAL GAIN APPROACH FOR TWO-STAGE SPEECH ENHANCEMENT
    Jin, Yu Gwang
    Lee, Chul Min
    Cho, Kiho
    Kim, Nam Soo
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4752 - 4755
  • [8] AN APPROACH TO DATA-DRIVEN LEARNING
    MARKOV, Z
    [J]. LECTURE NOTES IN ARTIFICIAL INTELLIGENCE, 1991, 535 : 127 - 140
  • [9] Approach to data-driven learning
    Markov, Z.
    [J]. International Workshop on Fundamentals of Artificial Intelligence Research, 1991,
  • [10] Innovation: A data-driven approach
    Kusiak, Andrew
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2009, 122 (01) : 440 - 448