LOW-LATENCY SPEAKER-INDEPENDENT CONTINUOUS SPEECH SEPARATION

被引：0

作者：

Yoshioka, Takuya ^{[1
]}

Chen, Zhuo ^{[1
]}

Liu, Changliang ^{[1
]}

Xiao, Xiong ^{[1
]}

Erdogan, Hakan ^{[1
]}

Dimitriadis, Dimitrios ^{[1
]}

机构：

[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Meeting transcription; continuous speech separation; speaker-independent speech separation; microphone arrays;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker independent continuous speech separation (SI-CSS) is a task of converting a continuous audio stream, which may contain overlapping voices of unknown speakers, into a fixed number of continuous signals each of which contains no overlapping speech segment. A separated, or cleaned, version of each utterance is generated from one of SI-CSS's output channels nondeterministically without being split up and distributed to multiple channels. A typical application scenario is transcribing multi-party conversations, such as meetings, recorded with microphone arrays. The output signals can be simply sent to a speech recognition engine because they do not include speech overlaps. The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency. This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting transcription task. This is achieved (1) by using a new speech separation network architecture combined with a double buffering scheme and (2) by performing enhancement with a set of fixed beamformers followed by a neural post-filter.

引用

下载

页码：6980 / 6984

页数：5

共 50 条

[1] SPEAKER-INDEPENDENT CONTINUOUS SPEECH DICTATION
GAUVAIN, JL
LAMEL, LF
ADDA, G
ADDADECKER, M
SPEECH COMMUNICATION, 1994, 15 (1-2) : 21 - 37
[2] The study on continuous speech of speaker-independent
Ye Hong
CHINESE JOURNAL OF ELECTRONICS, 2006, 15 (4A): : 921 - 924
[3] SPEAKER-CONSISTENT PARSING FOR SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
YAMAGUCHI, K
SINGER, H
MATSUNAGA, S
SAGAYAMA, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 719 - 724
[4] ON LARGE-VOCABULARY SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
LEE, KF
SPEECH COMMUNICATION, 1988, 7 (04) : 375 - 379
[5] LOW-LATENCY DEEP CLUSTERING FOR SPEECH SEPARATION
Wang, Shanshan
Naithani, Gaurav
Virtanen, Tuomas
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 76 - 80
[6] Speaker-independent model-based single channel speech separation
Radfar, M. H.
Dansereau, R. M.
Sayadiyan, A.
NEUROCOMPUTING, 2008, 72 (1-3) : 71 - 78
[7] Continuous speech of speaker-independent based on two weight neural networks
Cao Wen-ming
Ye Hong
Xu Chun-yan
Wang Shou-jue
PROCEEDINGS OF 2005 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1 AND 2, 2005, : 1415 - +
[8] Predictor codebook for speaker-independent speech recognition
Kawabata, Takeshi
Systems and Computers in Japan, 1994, 25 (01): : 37 - 46
[9] A speaker-independent continuous speech recognition system using biomimetic pattern recognition
Wang Shoujue
Qin Hong
CHINESE JOURNAL OF ELECTRONICS, 2006, 15 (03): : 460 - 462
[10] Hardware oriented architectures for continuous-speech speaker-independent ASR systems
Cardarilli, GC
Malatesta, A
Re, M
Arnone, L
Bocchio, S
Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 346 - 352

← 1 2 3 4 5 →