LOW-LATENCY SPEAKER-INDEPENDENT CONTINUOUS SPEECH SEPARATION

被引:0
|
作者
Yoshioka, Takuya [1 ]
Chen, Zhuo [1 ]
Liu, Changliang [1 ]
Xiao, Xiong [1 ]
Erdogan, Hakan [1 ]
Dimitriadis, Dimitrios [1 ]
机构
[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA
关键词
Meeting transcription; continuous speech separation; speaker-independent speech separation; microphone arrays;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment. A separated, or cleaned, version of each utterance is generated from one of SI-CSS's output channels nondeterministically without being split up and distributed to multiple channels. A typical application scenario is transcribing multi-party conversations, such as meetings, recorded with microphone arrays. The output signals can be simply sent to a speech recognition engine because they do not include speech overlaps. The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency. This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting transcription task. This is achieved (1) by using a new speech separation network architecture combined with a double buffering scheme and (2) by performing enhancement with a set of fixed beamformers followed by a neural post-filter.
引用
下载
收藏
页码:6980 / 6984
页数:5
相关论文
共 50 条
  • [1] SPEAKER-INDEPENDENT CONTINUOUS SPEECH DICTATION
    GAUVAIN, JL
    LAMEL, LF
    ADDA, G
    ADDADECKER, M
    SPEECH COMMUNICATION, 1994, 15 (1-2) : 21 - 37
  • [2] The study on continuous speech of speaker-independent
    Ye Hong
    CHINESE JOURNAL OF ELECTRONICS, 2006, 15 (4A): : 921 - 924
  • [3] SPEAKER-CONSISTENT PARSING FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
    YAMAGUCHI, K
    SINGER, H
    MATSUNAGA, S
    SAGAYAMA, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 719 - 724
  • [5] LOW-LATENCY DEEP CLUSTERING FOR SPEECH SEPARATION
    Wang, Shanshan
    Naithani, Gaurav
    Virtanen, Tuomas
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 76 - 80
  • [6] Speaker-independent model-based single channel speech separation
    Radfar, M. H.
    Dansereau, R. M.
    Sayadiyan, A.
    NEUROCOMPUTING, 2008, 72 (1-3) : 71 - 78
  • [7] Continuous speech of speaker-independent based on two weight neural networks
    Cao Wen-ming
    Ye Hong
    Xu Chun-yan
    Wang Shou-jue
    PROCEEDINGS OF 2005 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1 AND 2, 2005, : 1415 - +
  • [8] Predictor codebook for speaker-independent speech recognition
    Kawabata, Takeshi
    Systems and Computers in Japan, 1994, 25 (01): : 37 - 46
  • [9] A speaker-independent continuous speech recognition system using biomimetic pattern recognition
    Wang Shoujue
    Qin Hong
    CHINESE JOURNAL OF ELECTRONICS, 2006, 15 (03): : 460 - 462
  • [10] Hardware oriented architectures for continuous-speech speaker-independent ASR systems
    Cardarilli, GC
    Malatesta, A
    Re, M
    Arnone, L
    Bocchio, S
    Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 346 - 352