LOW-LATENCY SPEAKER-INDEPENDENT CONTINUOUS SPEECH SEPARATION

被引:0
|
作者
Yoshioka, Takuya [1 ]
Chen, Zhuo [1 ]
Liu, Changliang [1 ]
Xiao, Xiong [1 ]
Erdogan, Hakan [1 ]
Dimitriadis, Dimitrios [1 ]
机构
[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA
关键词
Meeting transcription; continuous speech separation; speaker-independent speech separation; microphone arrays;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment. A separated, or cleaned, version of each utterance is generated from one of SI-CSS's output channels nondeterministically without being split up and distributed to multiple channels. A typical application scenario is transcribing multi-party conversations, such as meetings, recorded with microphone arrays. The output signals can be simply sent to a speech recognition engine because they do not include speech overlaps. The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency. This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting transcription task. This is achieved (1) by using a new speech separation network architecture combined with a double buffering scheme and (2) by performing enhancement with a set of fixed beamformers followed by a neural post-filter.
引用
收藏
页码:6980 / 6984
页数:5
相关论文
共 50 条
  • [41] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
    Fahad, Md Shah
    Ranjan, Ashish
    Deepak, Akshay
    Pradhan, Gayadhar
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (11) : 6113 - 6135
  • [42] Speaker Adversarial Neural Network (SANN) for Speaker-independent Speech Emotion Recognition
    Md Shah Fahad
    Ashish Ranjan
    Akshay Deepak
    Gayadhar Pradhan
    Circuits, Systems, and Signal Processing, 2022, 41 : 6113 - 6135
  • [43] Binaural Multichannel Blind Speaker Separation With a Causal Low-Latency and Low-Complexity Approach
    Westhausen, Nils L.
    Meyer, Bernd T.
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 238 - 247
  • [44] Implementing a high accuracy speaker-independent continuous speech recognizer on a fixed-point DSP
    Gong, YF
    Kao, YH
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 3686 - 3689
  • [45] Cooperative supervised and unsupervised learning algorithm for phoneme recognition in continuous speech and speaker-independent context
    Arous, N
    Ellouze, N
    NEUROCOMPUTING, 2003, 51 : 225 - 235
  • [46] SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION USING FUZZY PARTITION MODEL (FPM) AND LR PARSERS
    FUKAZAWA, K
    KATO, Y
    SUGIYAMA, M
    SYSTEMS AND COMPUTERS IN JAPAN, 1994, 25 (14) : 32 - 48
  • [47] An automatic speech recognition system with speaker-independent identification support
    Caranica, Alexandru
    Burileanu, Corneliu
    ADVANCED TOPICS IN OPTOELECTRONICS, MICROELECTRONICS, AND NANOTECHNOLOGIES VII, 2015, 9258
  • [48] Speaker-independent telephone speech recognition system: the VCS TeleRec
    Hunt, Alan
    Speech technology, 1988, 4 (02): : 80 - 82
  • [49] Speaker-Independent Spectral Enhancement for Bone-Conducted Speech
    Cheng, Liangliang
    Dou, Yunfeng
    Zhou, Jian
    Wang, Huabin
    Tao, Liang
    ALGORITHMS, 2023, 16 (03)
  • [50] A SPEAKER-INDEPENDENT SPEECH RECOGNITION SYSTEM FOR TELEPHONE NETWORK APPLICATIONS
    TRNKA, R
    REVUE TECHNIQUE THOMSON-CSF, 1984, 16 (04): : 847 - 861