LOW-LATENCY SPEAKER-INDEPENDENT CONTINUOUS SPEECH SEPARATION

被引:0
|
作者
Yoshioka, Takuya [1 ]
Chen, Zhuo [1 ]
Liu, Changliang [1 ]
Xiao, Xiong [1 ]
Erdogan, Hakan [1 ]
Dimitriadis, Dimitrios [1 ]
机构
[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA
关键词
Meeting transcription; continuous speech separation; speaker-independent speech separation; microphone arrays;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment. A separated, or cleaned, version of each utterance is generated from one of SI-CSS's output channels nondeterministically without being split up and distributed to multiple channels. A typical application scenario is transcribing multi-party conversations, such as meetings, recorded with microphone arrays. The output signals can be simply sent to a speech recognition engine because they do not include speech overlaps. The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency. This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting transcription task. This is achieved (1) by using a new speech separation network architecture combined with a double buffering scheme and (2) by performing enhancement with a set of fixed beamformers followed by a neural post-filter.
引用
收藏
页码:6980 / 6984
页数:5
相关论文
共 50 条
  • [31] Generalized Cyclic Transformations in Speaker-Independent Speech Recognition
    Mueller, Florian
    Belilovsky, Eugene
    Mertins, Alfred
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 211 - 215
  • [32] SPEAKER-INDEPENDENT PERCEPTION OF HUMAN SPEECH BY ZEBRA FINCHES
    Ohms, Verena R.
    van Heijningen, Caroline A. A.
    Gill, Arike
    Beckers, Gabriel J. L.
    ten Cate, Carel
    EVOLUTION OF LANGUAGE, PROCEEDINGS, 2010, : 467 - +
  • [33] Uighur speaker-independent speech recognition based on CDCPM
    Wang, K.L.
    2001, Science Press (38):
  • [34] Acoustic-phonetic speech parameters for speaker-independent speech recognition
    Deshmukh, O
    Espy-Wilson, CY
    Juneja, A
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 593 - 596
  • [35] A CASA APPROACH TO DEEP LEARNING BASED SPEAKER-INDEPENDENT CO-CHANNEL SPEECH SEPARATION
    Liu, Yuzhou
    Wang, DeLiang
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5399 - 5403
  • [36] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
    Yul, Dang
    Kalbcek, Marten
    Tan, Zheng-Hua
    Jensen, Jesper
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245
  • [37] Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition
    Wang, Jun
    Samal, Ashok
    Green, Jordan R.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1179 - 1183
  • [38] Permutation invariant training of deep models for speaker-independent multi-talker speech separation
    Takahashi, Kohei
    Shiraishi, Toshihiko
    MECHANICAL ENGINEERING JOURNAL, 2023,
  • [39] Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation
    Ephrat, Ariel
    Mosseri, Inbar
    Lang, Oran
    Dekel, Tali
    Wilson, Kevin
    Hassidim, Avinatan
    Freeman, William T.
    Rubinstein, Michael
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [40] Speaker-independent speech recognition based on tree-structured speaker clustering
    Kosaka, T
    Matsunaga, S
    Sagayama, S
    COMPUTER SPEECH AND LANGUAGE, 1996, 10 (01): : 55 - 74