LOW-LATENCY SPEAKER-INDEPENDENT CONTINUOUS SPEECH SEPARATION

被引:0
|
作者
Yoshioka, Takuya [1 ]
Chen, Zhuo [1 ]
Liu, Changliang [1 ]
Xiao, Xiong [1 ]
Erdogan, Hakan [1 ]
Dimitriadis, Dimitrios [1 ]
机构
[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA
关键词
Meeting transcription; continuous speech separation; speaker-independent speech separation; microphone arrays;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment. A separated, or cleaned, version of each utterance is generated from one of SI-CSS's output channels nondeterministically without being split up and distributed to multiple channels. A typical application scenario is transcribing multi-party conversations, such as meetings, recorded with microphone arrays. The output signals can be simply sent to a speech recognition engine because they do not include speech overlaps. The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency. This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting transcription task. This is achieved (1) by using a new speech separation network architecture combined with a double buffering scheme and (2) by performing enhancement with a set of fixed beamformers followed by a neural post-filter.
引用
下载
收藏
页码:6980 / 6984
页数:5
相关论文
共 50 条
  • [21] SKIM: SKIPPING MEMORY LSTM FOR LOW-LATENCY REAL-TIME CONTINUOUS SPEECH SEPARATION
    Li, Chenda
    Yang, Lei
    Wang, Weiqin
    Qian, Yanmin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 681 - 685
  • [22] LOW-LATENCY SPEECH SEPARATION GUIDED DIARIZATION FOR TELEPHONE CONVERSATIONS
    Morrone, Giovanni
    Cornell, Samuele
    Raj, Desh
    Serafini, Luca
    Zovato, Enrico
    Brutti, Alessio
    Squartini, Stefano
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 641 - 646
  • [23] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
    Huang, Xuedong
    Lee, Kai-Fu
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
  • [24] Speaker adaptation techniques for speech recognition with a speaker-independent phonetic recognizer
    Kim, WG
    Jang, M
    COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 95 - 100
  • [25] CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING
    Li, Chenxing
    Zhu, Lei
    Xu, Shuang
    Gao, Peng
    Xu, Bo
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 711 - 715
  • [26] Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus
    Abushariah, Mohammad
    Ainon, Raja Noor
    Zainuddin, Roziati
    Elshafei, Moustafa
    Khalifa, Othman
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2012, 9 (01) : 84 - 93
  • [27] SPEAKER-INDEPENDENT DETECTION OF CHILD-DIRECTED SPEECH
    Schuster, Sebastian
    Pancoast, Stephanie
    Ganjoo, Milind
    Frank, Michael C.
    Jurafsky, Dan
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 366 - 371
  • [28] Biomimetic pattern recognition for speaker-independent speech recognition
    Qin, H
    Wang, SJ
    Sun, H
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 1290 - 1294
  • [29] On Speaker-Independent Personality Perception and Prediction from Speech
    Polzehl, Tim
    Schoenenberg, Katrin
    Moeller, Sebastian
    Metze, Florian
    Mohammadi, Gelareh
    Vinciarelli, Alessandro
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 258 - 261
  • [30] Speaker-Independent Speech Recognition using Visual Features
    Pooventhiran, G.
    Sandeep, A.
    Manthiravalli, K.
    Harish, D.
    Renuka, Karthika D.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 616 - 620